Running 'make check' hangs for ever

classic Classic list List threaded Threaded
6 messages Options
Reply | Threaded
Open this post in threaded view
|

Running 'make check' hangs for ever

Isaac Boukris
Hi,

I'm trying to run 'make check' on 32bit fedora (21 / 23) installed on
VM and it hangs for ever.

# uname -a
Linux fedora 4.2.8-300.fc23.i686+PAE #1 SMP Tue Dec 15 17:13:23 UTC
2015 i686 i686 i386 GNU/Linux

Initially I thought it was lack of entropy (had it once) but it seems
something different, related to flock.

The command 'make check' hangs at:

PYTHONPATH=../../../util VALGRIND="" python ./t_cccol.py

With, 'strace -fyv make check' I see:

[pid 21891] fcntl64(6</home/admin/git/krb5/src/lib/krb5/ccache/testdir/db.kadm5.lock>,
F_OFD_SETLKW, {l_type=F_WRLCK, l_whence=SEEK_SET, l_start=0,
l_len=-5230051357888610304}) = -1 EINVAL (Invalid argument)
[pid 21891] fcntl64(6</home/admin/git/krb5/src/lib/krb5/ccache/testdir/db.kadm5.lock>,
F_SETLKW, {l_type=F_WRLCK, l_whence=SEEK_SET, l_start=0, l_len=0}

And with 'gdb attach $(pidof kadmin.local)' I get:

#0  0xb77ccbe8 in __kernel_vsyscall ()
#1  0xb75b8c05 in fcntl () from /lib/libpthread.so.0
#2  0xb76912f5 in ofdlock (lock_arg=<optimized out>, cmd=<optimized
out>, fd=<optimized out>) at lock_file.c:80
#3  krb5_lock_file (context=0x946a4d8, fd=6, mode=2) at lock_file.c:132
#4  0xb73c159b in osa_adb_get_lock (db=0x946a488, mode=2) at adb_openclose.c:235
#5  0xb73c25cb in ctx_lock (context=context@entry=0x9468880,
dbc=dbc@entry=0x9469e68, lockmode=lockmode@entry=2) at kdb_db2.c:472
#6  0xb73c3655 in krb5_db2_put_principal (context=0x9468880,
entry=0x946af68, db_args=0x0) at kdb_db2.c:839
#7  0xb73c4872 in wrap_krb5_db2_put_principal (ctx=0x9468880,
d=0x946af68, db_args=0x0) at db2_exp.c:124
#8  0xb7792d31 in krb5int_put_principal_no_log (kcontext=0x9468880,
entry=0x946af68) at kdb5.c:866
#9  0xb7792dec in krb5_db_put_principal (kcontext=0x9468880,
entry=0x946af68) at kdb5.c:893
#10 0xb77b28c7 in kdb_put_entry (handle=0x9469b90, kdb=0x946af68,
adb=0xbfd37464) at server_kdb.c:397
#11 0xb77adcf8 in kadm5_create_principal_3 (server_handle=0x9469b90,
entry=0xbfd37564, mask=1, n_ks_tuple=0, ks_tuple=0x0,
password=0xbfd395c4 "alice22387")
    at svr_principal.c:579
#12 0xb77addf8 in kadm5_create_principal (server_handle=0x9469b90,
entry=0xbfd37564, mask=1, password=0xbfd395c4 "alice22387") at
svr_principal.c:367
#13 0x0804b751 in create_princ (princ=<optimized out>, mask=<optimized
out>, n_ks=<optimized out>, ks=0x0, pass=0xbfd395c4 "alice22387") at
kadmin.c:207
#14 0x0804d029 in kadmin_addprinc (argc=4, argv=0x946abb0) at kadmin.c:1223
#15 0x08051af8 in check_request_table (rqtbl=<optimized out>,
sci_idx=1, argv=0x946abb0, argc=4) at execute_cmd.c:74
#16 really_execute_command (sci_idx=sci_idx@entry=1,
argc=argc@entry=4, argv=argv@entry=0xbfd37b2c) at execute_cmd.c:115
#17 0x08051bc0 in ss_execute_command (sci_idx=1, argv=0xbfd37c28) at
execute_cmd.c:152
#18 0x0804a516 in main (argc=5, argv=0xbfd37c24) at ss_wrapper.c:59


By looking around in the code I managed to workaround this by
disabling POSIX_FILE_LOCKS in 'lock_file.c'.
It nows goes much further, although the tests still fail, this time
properly with:

PYTHONPATH=../../util VALGRIND="" python ./t_gss_sample.py
*** Failure: /home/admin/git/krb5/src/appl/gss-sample/gss-client
failed with code 1.

Thought this may be worth sharing, perhaps someone could make sense of it.

Thanks and regards,
Isaac B.
________________________________________________
Kerberos mailing list           [hidden email]
https://mailman.mit.edu/mailman/listinfo/kerberos
Reply | Threaded
Open this post in threaded view
|

Re: Running 'make check' hangs for ever

Greg Hudson
On 12/30/2015 04:28 PM, Isaac Boukris wrote:
> [pid 21891] fcntl64(6</home/admin/git/krb5/src/lib/krb5/ccache/testdir/db.kadm5.lock>,
> F_OFD_SETLKW, {l_type=F_WRLCK, l_whence=SEEK_SET, l_start=0,
> l_len=-5230051357888610304}) = -1 EINVAL (Invalid argument)
> [pid 21891] fcntl64(6</home/admin/git/krb5/src/lib/krb5/ccache/testdir/db.kadm5.lock>,
> F_SETLKW, {l_type=F_WRLCK, l_whence=SEEK_SET, l_start=0, l_len=0}

Unfortunately, I can't really tell what's going on.  In ofdlock(), we
pass the same struct flock pointer to both fcntl() invocations when we
fall back to F_SETLKW, so I don't know why the first invocation is
reported a garbage l_len.  I also don't know why the second invocation
is blocking; did the first invocation somehow obtain a lock despite
returning EINVAL?  I can't find any search results about a known kernel
or glibc bug which might explain this odd behavior.

> PYTHONPATH=../../util VALGRIND="" python ./t_gss_sample.py
> *** Failure: /home/admin/git/krb5/src/appl/gss-sample/gss-client
> failed with code 1.

There should have been some additional messages explaining how to get
more information from a failing Python test; that information (and
perhaps more) is needed to figure out why this is failing for you.
________________________________________________
Kerberos mailing list           [hidden email]
https://mailman.mit.edu/mailman/listinfo/kerberos
Reply | Threaded
Open this post in threaded view
|

Re: Running 'make check' hangs for ever

Isaac Boukris
On Thu, Dec 31, 2015 at 1:08 AM, Greg Hudson <[hidden email]> wrote:

> On 12/30/2015 04:28 PM, Isaac Boukris wrote:
>> [pid 21891] fcntl64(6</home/admin/git/krb5/src/lib/krb5/ccache/testdir/db.kadm5.lock>,
>> F_OFD_SETLKW, {l_type=F_WRLCK, l_whence=SEEK_SET, l_start=0,
>> l_len=-5230051357888610304}) = -1 EINVAL (Invalid argument)
>> [pid 21891] fcntl64(6</home/admin/git/krb5/src/lib/krb5/ccache/testdir/db.kadm5.lock>,
>> F_SETLKW, {l_type=F_WRLCK, l_whence=SEEK_SET, l_start=0, l_len=0}
>
> Unfortunately, I can't really tell what's going on.  In ofdlock(), we
> pass the same struct flock pointer to both fcntl() invocations when we
> fall back to F_SETLKW, so I don't know why the first invocation is
> reported a garbage l_len.  I also don't know why the second invocation
> is blocking; did the first invocation somehow obtain a lock despite
> returning EINVAL?  I can't find any search results about a known kernel
> or glibc bug which might explain this odd behavior.


Strange indeed, it does look like the fd is locked:
# cat /proc/24040/fdinfo/6
pos:    0
flags:  02000002
mnt_id: 57
lock:   1: POSIX  ADVISORY  READ  24040 fd:00:1195813 0 EOF

I've pasted a longer output of strace at:
http://pastebin.com/Rw8nvjCZ

>> PYTHONPATH=../../util VALGRIND="" python ./t_gss_sample.py
>> *** Failure: /home/admin/git/krb5/src/appl/gss-sample/gss-client
>> failed with code 1.
>
> There should have been some additional messages explaining how to get
> more information from a failing Python test; that information (and
> perhaps more) is needed to figure out why this is failing for you.


So I looked at the testlog file and noticed it got confused because my
host had two DNS records with different names (round robin).
Once I removed the additional name the 'make check' succeeded.

Thanks!
________________________________________________
Kerberos mailing list           [hidden email]
https://mailman.mit.edu/mailman/listinfo/kerberos
Reply | Threaded
Open this post in threaded view
|

Re: Running 'make check' hangs for ever

Isaac Boukris
On Thu, Dec 31, 2015 at 2:43 AM, Isaac Boukris <[hidden email]> wrote:

> On Thu, Dec 31, 2015 at 1:08 AM, Greg Hudson <[hidden email]> wrote:
>> On 12/30/2015 04:28 PM, Isaac Boukris wrote:
>>> [pid 21891] fcntl64(6</home/admin/git/krb5/src/lib/krb5/ccache/testdir/db.kadm5.lock>,
>>> F_OFD_SETLKW, {l_type=F_WRLCK, l_whence=SEEK_SET, l_start=0,
>>> l_len=-5230051357888610304}) = -1 EINVAL (Invalid argument)
>>> [pid 21891] fcntl64(6</home/admin/git/krb5/src/lib/krb5/ccache/testdir/db.kadm5.lock>,
>>> F_SETLKW, {l_type=F_WRLCK, l_whence=SEEK_SET, l_start=0, l_len=0}
>>
>> Unfortunately, I can't really tell what's going on.  In ofdlock(), we
>> pass the same struct flock pointer to both fcntl() invocations when we
>> fall back to F_SETLKW, so I don't know why the first invocation is
>> reported a garbage l_len.  I also don't know why the second invocation
>> is blocking; did the first invocation somehow obtain a lock despite
>> returning EINVAL?  I can't find any search results about a known kernel
>> or glibc bug which might explain this odd behavior.
>
>
> Strange indeed, it does look like the fd is locked:
> # cat /proc/24040/fdinfo/6
> pos:    0
> flags:  02000002
> mnt_id: 57
> lock:   1: POSIX  ADVISORY  READ  24040 fd:00:1195813 0 EOF
>
> I've pasted a longer output of strace at:
> http://pastebin.com/Rw8nvjCZ


FYI I think there is something wrong with my system.
I tried to investigate by starting with an example from:
https://www.gnu.org/software/libc/manual/html_node/Open-File-Description-Locks-Example.html

But it get stuck in a similar manner:

[pid  6683] fcntl64(4</tmp/foo>, F_OFD_SETLK, {l_type=F_UNLCK,
l_whence=SEEK_SET, l_start=4294967296, l_len=8367752102667091968}) =
-1 EINVAL (Invalid argument)
[pid  6683] nanosleep({0, 1000},  <unfinished ...>
[pid  6682] <... open resumed> )        = 5</tmp/foo>
[pid  6683] <... nanosleep resumed> NULL) = 0
[pid  6683] madvise(0xb6571000, 8372224, MADV_DONTNEED) = 0
[pid  6683] exit(0)                     = ?
[pid  6683] +++ exited with 0 +++
[pid  6682] fcntl64(5</tmp/foo>, F_OFD_SETLKW, {l_type=F_WRLCK,
l_whence=SEEK_SET, l_start=4294967296, l_len=0}

Thanks for the feedback and happy new year!

>>> PYTHONPATH=../../util VALGRIND="" python ./t_gss_sample.py
>>> *** Failure: /home/admin/git/krb5/src/appl/gss-sample/gss-client
>>> failed with code 1.
>>
>> There should have been some additional messages explaining how to get
>> more information from a failing Python test; that information (and
>> perhaps more) is needed to figure out why this is failing for you.
>
>
> So I looked at the testlog file and noticed it got confused because my
> host had two DNS records with different names (round robin).
> Once I removed the additional name the 'make check' succeeded.
>
> Thanks!
________________________________________________
Kerberos mailing list           [hidden email]
https://mailman.mit.edu/mailman/listinfo/kerberos
Reply | Threaded
Open this post in threaded view
|

Re: Running 'make check' hangs for ever

Isaac Boukris
On Fri, Jan 1, 2016 at 12:06 AM, Isaac Boukris <[hidden email]> wrote:

> On Thu, Dec 31, 2015 at 2:43 AM, Isaac Boukris <[hidden email]> wrote:
>> On Thu, Dec 31, 2015 at 1:08 AM, Greg Hudson <[hidden email]> wrote:
>>> On 12/30/2015 04:28 PM, Isaac Boukris wrote:
>>>> [pid 21891] fcntl64(6</home/admin/git/krb5/src/lib/krb5/ccache/testdir/db.kadm5.lock>,
>>>> F_OFD_SETLKW, {l_type=F_WRLCK, l_whence=SEEK_SET, l_start=0,
>>>> l_len=-5230051357888610304}) = -1 EINVAL (Invalid argument)
>>>> [pid 21891] fcntl64(6</home/admin/git/krb5/src/lib/krb5/ccache/testdir/db.kadm5.lock>,
>>>> F_SETLKW, {l_type=F_WRLCK, l_whence=SEEK_SET, l_start=0, l_len=0}
>>>
>>> Unfortunately, I can't really tell what's going on.  In ofdlock(), we
>>> pass the same struct flock pointer to both fcntl() invocations when we
>>> fall back to F_SETLKW, so I don't know why the first invocation is
>>> reported a garbage l_len.  I also don't know why the second invocation
>>> is blocking; did the first invocation somehow obtain a lock despite
>>> returning EINVAL?  I can't find any search results about a known kernel
>>> or glibc bug which might explain this odd behavior.
>>
>>
>> Strange indeed, it does look like the fd is locked:
>> # cat /proc/24040/fdinfo/6
>> pos:    0
>> flags:  02000002
>> mnt_id: 57
>> lock:   1: POSIX  ADVISORY  READ  24040 fd:00:1195813 0 EOF
>>
>> I've pasted a longer output of strace at:
>> http://pastebin.com/Rw8nvjCZ
>
>
> FYI I think there is something wrong with my system.
> I tried to investigate by starting with an example from:
> https://www.gnu.org/software/libc/manual/html_node/Open-File-Description-Locks-Example.html
>
> But it get stuck in a similar manner:
>
> [pid  6683] fcntl64(4</tmp/foo>, F_OFD_SETLK, {l_type=F_UNLCK,
> l_whence=SEEK_SET, l_start=4294967296, l_len=8367752102667091968}) =
> -1 EINVAL (Invalid argument)
> [pid  6683] nanosleep({0, 1000},  <unfinished ...>
> [pid  6682] <... open resumed> )        = 5</tmp/foo>
> [pid  6683] <... nanosleep resumed> NULL) = 0
> [pid  6683] madvise(0xb6571000, 8372224, MADV_DONTNEED) = 0
> [pid  6683] exit(0)                     = ?
> [pid  6683] +++ exited with 0 +++
> [pid  6682] fcntl64(5</tmp/foo>, F_OFD_SETLKW, {l_type=F_WRLCK,
> l_whence=SEEK_SET, l_start=4294967296, l_len=0}
>
> Thanks for the feedback and happy new year!


FYI 2, I've hit this again and looked a little further.
It looks like the bug is in glibc on 32 arch, see:
https://sourceware.org/bugzilla/show_bug.cgi?id=20251

Changing 'struct flock' to 'struct flock64' in 'lock_file.c' also works.

Regards.
________________________________________________
Kerberos mailing list           [hidden email]
https://mailman.mit.edu/mailman/listinfo/kerberos
Reply | Threaded
Open this post in threaded view
|

Re: Running 'make check' hangs for ever

Greg Hudson
On 08/13/2016 12:28 PM, Isaac Boukris wrote:
> FYI 2, I've hit this again and looked a little further.
> It looks like the bug is in glibc on 32 arch, see:
> https://sourceware.org/bugzilla/show_bug.cgi?id=20251
>
> Changing 'struct flock' to 'struct flock64' in 'lock_file.c' also works.

Thanks for the follow-up.  Since upstream doesn't seem to be in a hurry
to fix the bug, I have opened
http://krbdev.mit.edu/rt/Ticket/Display.html?id=8474
and will try to put together a workaround.
________________________________________________
Kerberos mailing list           [hidden email]
https://mailman.mit.edu/mailman/listinfo/kerberos