Richard Silverman wrote:
On Sun, 23 Dec 2012, Howard Chu wrote:
Also, what kernel version(s) are you testing on?
2.6.32 (Red Hat Enterprise Linux)
I see a lot of Linux-kernel email traffic about epoll bugs as well, but
not
sure which are relevant to this version. Just something to stay aware of.
Good to know, although I think the bug is independent of this. We actually
ran into this exact issue on Solaris several years ago and reported it
then:
http://www.openldap.org/its/index.cgi/Incoming?id=6920
That bug is against Solaris, which has always only used select(), since
epoll() is Linux-only. I don't see how the two can be related, since your
current issue is epoll-specific.
It’s not clear to me. All we know at the moment is that if I rebuild with select(), I can no longer reproduce the problem. With a timing-dependent concurrency bug we do not yet fully understand, I would not call that conclusive, merely suggestive; it could just as well be that the different code alters the timing such that my test no longer tickles the bug. In fact, as I said it happened much less frequently for us on Solaris, on the order of a few times a month. I have not yet read your earlier bug reference though, so perhaps there’s information there that makes this more clear.
As the historical stuff below shows, once again most threads are blocked on condition waits while one is polling I/O; certainly, the symptoms were the same (slapd became unresponsive and had to be SIGKILL’ed). On the other hand, the last bit (gdb) shows that the cond_wait is for database access, not network as with our current scenario. Could be the same bug manifesting slightly differently.
Anyway, this is of course not directly relevant to solving the current problem; I just thought I’d point it out. At the moment, it seems far too similar to be coincidence to me.
--
Richard
--------------------------------------------------------------------------------
(from
http://www.openldap.org/its/index.cgi/Incoming?id=6920)
################
# truss output #
################
12612/17: lwp_cond_wait(0xFFFFFD7FFC8B2C38, 0xFFFFFD7FFC8B2C20,
0x00000000, 1) (sleeping...)
12612/10: lwp_cond_wait(0xFFFFFD7FFC8B41C8, 0xFFFFFD7FFC8B41B0,
0x00000000, 1) (sleeping...)
12612/2: pollsys(0xFFFFFD7FBB3FE510, 59, 0xFFFFFD7FBB3FF290, 0x00000000)
(sleeping...)
12612/15: lwp_cond_wait(0xFFFFFD7FFC8ABF78, 0xFFFFFD7FFC8ABF60,
0x00000000, 1) (sleeping...)
12612/14: lwp_cond_wait(0xFFFFFD7FFC895610, 0xFFFFFD7FFC8955F8,
0x00000000, 1) (sleeping...)
12612/16: lwp_cond_wait(0xFFFFFD7FFC86DA70, 0xFFFFFD7FFC86DA58,
0x00000000, 1) (sleeping...)
12612/18: lwp_cond_wait(0xFFFFFD7FFC888B18, 0xFFFFFD7FFC888B00,
0x00000000, 1) (sleeping...)
12612/1: lwp_wait(2, 0xFFFFFD7FFFDFFBE4) (sleeping...)
12612/8: lwp_cond_wait(0xFFFFFD7FFC8C7BD8, 0xFFFFFD7FFC8C7BC0,
0x00000000, 1) (sleeping...)
12612/6: lwp_cond_wait(0xFFFFFD7FFC8AFF38, 0xFFFFFD7FFC8AFF20,
0x00000000, 1) (sleeping...)
12612/3: lwp_cond_wait(0xFFFFFD7FFC8B0208, 0xFFFFFD7FFC8B01F0,
0x00000000, 1) (sleeping...)
12612/12: lwp_cond_wait(0xFFFFFD7FFC8B0118, 0xFFFFFD7FFC8B0100,
0x00000000, 1) (sleeping...)
12612/7: lwp_cond_wait(0xFFFFFD7FFC918A10, 0xFFFFFD7FFC9189F8,
0x00000000, 1) (sleeping...)
12612/11: lwp_cond_wait(0xFFFFFD7FFC7055C0, 0xFFFFFD7FFC7055A8,
0x00000000, 1) (sleeping...)
12612/13: lwp_cond_wait(0xFFFFFD7FFC901F40, 0xFFFFFD7FFC901F28,
0x00000000, 1) (sleeping...)
12612/5: lwp_cond_wait(0xFFFFFD7FFC8BCC10, 0xFFFFFD7FFC8BCBF8,
0x00000000, 1) (sleeping...)
12612/9: lwp_cond_wait(0xFFFFFD7FFC8684A8, 0xFFFFFD7FFC868490,
0x00000000, 1) (sleeping...)
12612/4: lwp_cond_wait(0xFFFFFD7FFC86C7B0, 0xFFFFFD7FFC86C798,
0x00000000, 1) (sleeping...)
--------------------------------------------------------------------------------
from my notes at the time:
Here is the truss -vall
20753/9: lwp_cond_wait(0xFFFFFD7FFC705200, 0xFFFFFD7FFC7051E8, 0x00000000, 1) (sleeping...)
20753/9: condvar type: USYNC_PROCESS
20753/9: mutex type: USYNC_PROCESS
20753/18: lwp_cond_wait(0xFFFFFD7FFC71E610, 0xFFFFFD7FFC71E5F8, 0x00000000, 1) (sleeping...)
20753/18: condvar type: USYNC_PROCESS
20753/18: mutex type: USYNC_PROCESS
20753/11: lwp_cond_wait(0xFFFFFD7FFC723A70, 0xFFFFFD7FFC723A58, 0x00000000, 1) (sleeping...)
20753/11: condvar type: USYNC_PROCESS
20753/11: mutex type: USYNC_PROCESS
20753/17: lwp_cond_wait(0xFFFFFD7FFC71D350, 0xFFFFFD7FFC71D338, 0x00000000, 1) (sleeping...)
20753/17: condvar type: USYNC_PROCESS
20753/17: mutex type: USYNC_PROCESS
20753/20: lwp_cond_wait(0xFFFFFD7FFC7FD978, 0xFFFFFD7FFC7FD960, 0x00000000, 1) (sleeping...)
20753/20: condvar type: USYNC_PROCESS
20753/20: mutex type: USYNC_PROCESS
20753/13: lwp_cond_wait(0xFFFFFD7FFC7091C0, 0xFFFFFD7FFC7091A8, 0x00000000, 1) (sleeping...)
20753/13: condvar type: USYNC_PROCESS
20753/13: mutex type: USYNC_PROCESS
20753/4: pollsys(0xFFFFFD7FBB3FEC10, 23, 0xFFFFFD7FBB3FF290, 0x00000000) (sleeping...)
20753/4: fd=5 ev=POLLRDNORM rev=0
20753/4: fd=7 ev=POLLRDNORM rev=0
20753/4: fd=9 ev=POLLRDNORM rev=0
20753/4: fd=10 ev=POLLRDNORM rev=0
20753/4: fd=19 ev=POLLRDNORM rev=0
20753/4: fd=20 ev=POLLRDNORM rev=0
20753/4: fd=21 ev=POLLRDNORM rev=0
20753/4: fd=27 ev=POLLRDNORM rev=0
20753/4: fd=29 ev=POLLRDNORM rev=0
20753/4: fd=31 ev=POLLRDNORM rev=0
20753/4: fd=45 ev=POLLRDNORM rev=0
20753/4: fd=51 ev=POLLRDNORM rev=0
20753/4: fd=53 ev=POLLRDNORM rev=0
20753/4: fd=54 ev=POLLRDNORM rev=0
20753/4: fd=55 ev=POLLRDNORM rev=0
20753/4: fd=58 ev=POLLRDNORM rev=0
20753/4: fd=59 ev=POLLRDNORM rev=0
20753/4: fd=60 ev=POLLRDNORM rev=0
20753/4: fd=61 ev=POLLRDNORM rev=0
20753/4: fd=62 ev=POLLRDNORM rev=0
20753/4: fd=63 ev=POLLRDNORM rev=0
20753/4: fd=64 ev=POLLRDNORM rev=0
20753/4: fd=65 ev=POLLRDNORM rev=0
20753/4: timeout: 3022.000000000 sec
20753/5: lwp_cond_wait(0xFFFFFD7FFC71E688, 0xFFFFFD7FFC71E670, 0x00000000, 1) (sleeping...)
20753/5: condvar type: USYNC_PROCESS
20753/5: mutex type: USYNC_PROCESS
20753/12: lwp_cond_wait(0xFFFFFD7FFC720C80, 0xFFFFFD7FFC720C68, 0x00000000, 1) (sleeping...)
20753/12: condvar type: USYNC_PROCESS
20753/12: mutex type: USYNC_PROCESS
20753/1: lwp_wait(4, 0xFFFFFD7FFFDFFBF4) (sleeping...)
20753/16: lwp_cond_wait(0xFFFFFD7FFC80F150, 0xFFFFFD7FFC80F138, 0x00000000, 1) (sleeping...)
20753/16: condvar type: USYNC_PROCESS
20753/16: mutex type: USYNC_PROCESS
20753/6: lwp_cond_wait(0xFFFFFD7FFC70E800, 0xFFFFFD7FFC70E7E8, 0x00000000, 1) (sleeping...)
20753/6: condvar type: USYNC_PROCESS
20753/6: mutex type: USYNC_PROCESS
20753/19: lwp_cond_wait(0xFFFFFD7FFC71E598, 0xFFFFFD7FFC71E580, 0x00000000, 1) (sleeping...)
20753/19: condvar type: USYNC_PROCESS
20753/19: mutex type: USYNC_PROCESS
20753/7: lwp_cond_wait(0xFFFFFD7FFC71D800, 0xFFFFFD7FFC71D7E8, 0x00000000, 1) (sleeping...)
20753/7: condvar type: USYNC_PROCESS
20753/7: mutex type: USYNC_PROCESS
20753/15: lwp_cond_wait(0xFFFFFD7FFC8C7AE8, 0xFFFFFD7FFC8C7AD0, 0x00000000, 1) (sleeping...)
20753/15: condvar type: USYNC_PROCESS
20753/15: mutex type: USYNC_PROCESS
20753/10: lwp_cond_wait(0xFFFFFD7FFC71BCD0, 0xFFFFFD7FFC71BCB8, 0x00000000, 1) (sleeping...)
20753/10: condvar type: USYNC_PROCESS
20753/10: mutex type: USYNC_PROCESS
20753/8: lwp_cond_wait(0xFFFFFD7FFC80F240, 0xFFFFFD7FFC80F228, 0x00000000, 1) (sleeping...)
20753/8: condvar type: USYNC_PROCESS
20753/8: mutex type: USYNC_PROCESS
20753/14: lwp_cond_wait(0xFFFFFD7FFC71D698, 0xFFFFFD7FFC71D680, 0x00000000, 1) (sleeping...)
20753/14: condvar type: USYNC_PROCESS
20753/14: mutex type: USYNC_PROCESS
--------------------------------------------------------------------------------
(gdb) thread 3
[Switching to thread 3 (LWP 3 )]#0 0xfffffd7fff31cb1a in ___lwp_cond_wait () from /lib/64/libc.so.1
(gdb) bt
#0 0xfffffd7fff31cb1a in ___lwp_cond_wait () from /lib/64/libc.so.1
#1 0xfffffd7fff310e49 in cond_sleep_kernel () from /lib/64/libc.so.1
#2 0xfffffd7fff310ef9 in cond_wait_kernel () from /lib/64/libc.so.1
#3 0xfffffd7fff311094 in cond_wait_common () from /lib/64/libc.so.1
#4 0xfffffd7fff3112a7 in _cond_wait () from /lib/64/libc.so.1
#5 0xfffffd7fff3112e6 in cond_wait () from /lib/64/libc.so.1
#6 0xfffffd7fff311329 in pthread_cond_wait () from /lib/64/libc.so.1
#7 0xfffffd7ffec927ab in __db_pthread_mutex_lock (env=<value optimized out>, mutex=<value optimized out>)
at ../mutex/mut_pthread.c:318
#8 0xfffffd7ffec91c2e in __db_tas_mutex_lock_int (env=<value optimized out>, mutex=<value optimized out>,
nowait=<value optimized out>) at ../mutex/mut_tas.c:218
#9 0xfffffd7ffed3928a in __lock_get_internal (lt=<value optimized out>, sh_locker=<value optimized out>,
flags=<value optimized out>, obj=<value optimized out>, lock_mode=<value optimized out>,
timeout=<value optimized out>, lock=<value optimized out>) at ../lock/lock.c:953
#10 0xfffffd7ffed38135 in __lock_get (env=<value optimized out>, locker=<value optimized out>,
flags=<value optimized out>, obj=<value optimized out>, lock_mode=<value optimized out>,
lock=<value optimized out>) at ../lock/lock.c:459
#11 0xfffffd7ffed74a00 in __db_lget (dbc=<value optimized out>, action=<value optimized out>,
pgno=<value optimized out>, mode=<value optimized out>, lkflags=<value optimized out>,
lockp=<value optimized out>) at ../db/db_meta.c:1204
#12 0xfffffd7ffecb584c in __bam_search (dbc=<value optimized out>, root_pgno=<value optimized out>,
key=<value optimized out>, flags=<value optimized out>, slevel=<value optimized out>,
recnop=<value optimized out>, exactp=<value optimized out>) at ../btree/bt_search.c:703
#13 0xfffffd7ffeca1b8a in __bamc_search (dbc=<value optimized out>, root_pgno=<value optimized out>,
key=<value optimized out>, flags=<value optimized out>, exactp=<value optimized out>)
at ../btree/bt_cursor.c:2786
#14 0xfffffd7ffec9cf1c in __bamc_get (dbc=<value optimized out>, key=<value optimized out>,
data=<value optimized out>, flags=<value optimized out>, pgnop=<value optimized out>)
at ../btree/bt_cursor.c:1089
#15 0xfffffd7ffed5fdf0 in __dbc_iget (dbc=<value optimized out>, key=<value optimized out>,
data=<value optimized out>, flags=<value optimized out>) at ../db/db_cam.c:934
#16 0xfffffd7ffed6f600 in __dbc_get_pp (dbc=<value optimized out>, key=<value optimized out>,
data=<value optimized out>, flags=<value optimized out>) at ../db/db_iface.c:2201
#17 0x000000000053bbf1 in bdb_idl_delete_key ()
#18 0x000000000053dbd7 in bdb_key_change ()
The worker threads are all waiting to acquire a lock which will allow them to access the LDAP database. All of them are calling different routines in OpenLDAP for database access of various kinds (deleting a key in this case), but they all come down to the same locking code in BerkeleyDB. They’re not waiting for I/O access.
--------------------------------------------------------------------------------