I randomly notice my OpenLDAP server freezes, and I can't udnerstand why. I have a few LDAP clients (ldapsearch, a legacy Java app. and ApacheDirectoryStudio), running from different systems, either locally on the OpenLDAP server, or on another OpenLDAP server, or on a remote workstation, and none manages to get an answer from OpenLDAP. The connection is established but each client gets stuck waiting for any result.
The last time, it occured only a few minutes after the OpenLDAP daemon was restarted, under very low load. I use OpenLDAP 2.4.23, over Red Hat Enterprise Linux Server release 5.4 (Tikanga), Kernel version 2.6.18-164.el5, amd64 architecture over VMWare. I also use BDB 4.8.30, and OpenLDAP's been compiled with the following options:
./configure --prefix=/opt/openldap-2.4.23 --enable-dynamic=yes --enable-dynlist=yes --enable-memberof=yes --enable-ppolicy=yes --enable-accesslog=yes --enable-sssvlv=yes
I noticed only 74 established connections on port 389 (according to netstat) when the freeze occurs.
The OpenLDAP server is set in syncrepl mirror mode replication, and the other OpenLDAP server (only used as a hot stand-by) works fine (but doesn't receive any load)
I only have the following lines at the end of the OpenLDAP log: (IP @ and suffixes voluntarily hidden)
Jul 12 10:19:32 dev-ldap1 slapd[28525]: conn=3377 fd=65 ACCEPT from IP=.......:3820 (IP=0.0.0.0:389) Jul 12 10:19:32 dev-ldap1 slapd[28525]: conn=3377 op=0 BIND dn="employeeNumber=02071,ou= people,dc=...." method=128 Jul 12 10:20:05 dev-ldap1 slapd[28525]: connection_input: conn=3377 deferring operation: binding Jul 12 10:20:07 dev-ldap1 slapd[28525]: conn=3378 fd=66 ACCEPT from IP=......:3823 (IP=0.0.0.0:389)
The stack of the slapd process shows the following:
# *pstack 28525* Thread 18 (Thread 0x408e4940 (LWP 28529)): #0 0x0000003a53ad4018 in epoll_wait () from /lib64/libc.so.6 #1 0x000000000041d76e in ldap_pvt_sasl_mutex_dispose () #2 0x0000003a546064a7 in start_thread () from /lib64/libpthread.so.0 #3 0x0000003a53ad3c2d in clone () from /lib64/libc.so.6 Thread 17 (Thread 0x41360940 (LWP 28530)): #0 0x0000003a5460ab99 in pthread_cond_wait@@GLIBC_2.3.2 () #1 0x00002ba988b6cdd2 in __db_pthread_mutex_lock () #2 0x00002ba988b6c426 in __db_tas_mutex_lock () #3 0x00002ba988bfcb49 in __lock_get_internal () #4 0x00002ba988bfd2c2 in __lock_get_pp () #5 0x00000000004af398 in ldap_pvt_sasl_mutex_dispose () #6 0x00000000004b04ca in ldap_pvt_sasl_mutex_dispose () #7 0x0000000000492a80 in ldap_pvt_sasl_mutex_dispose () #8 0x0000000000481bc2 in ldap_pvt_sasl_mutex_dispose () #9 0x0000000000482127 in ldap_pvt_sasl_mutex_dispose () #10 0x0000000000422736 in ldap_pvt_sasl_mutex_dispose () #11 0x0000000000481bc2 in ldap_pvt_sasl_mutex_dispose () #12 0x0000000000482127 in ldap_pvt_sasl_mutex_dispose () #13 0x0000000000422ec7 in ldap_pvt_sasl_mutex_dispose () #14 0x00000000004205a5 in ldap_pvt_sasl_mutex_dispose () #15 0x0000000000420b7f in ldap_pvt_sasl_mutex_dispose () #16 0x00002ba9886e08a8 in ldap_int_thread_pool_wrapper () #17 0x0000003a546064a7 in start_thread () from /lib64/libpthread.so.0 #18 0x0000003a53ad3c2d in clone () from /lib64/libc.so.6 Thread 16 (Thread 0x41b61940 (LWP 28531)): #0 0x0000003a5460ab99 in pthread_cond_wait@@GLIBC_2.3.2 () #1 0x00002ba988b6cdd2 in __db_pthread_mutex_lock () #2 0x00002ba988b6c426 in __db_tas_mutex_lock () #3 0x00002ba988bfcb49 in __lock_get_internal () #4 0x00002ba988bfd2c2 in __lock_get_pp () #5 0x00000000004af398 in ldap_pvt_sasl_mutex_dispose () #6 0x00000000004b04ca in ldap_pvt_sasl_mutex_dispose () #7 0x00000000004b4183 in ldap_pvt_sasl_mutex_dispose () #8 0x00000000004b802e in ldap_pvt_sasl_mutex_dispose () #9 0x0000000000481a7a in ldap_pvt_sasl_mutex_dispose () #10 0x0000000000482547 in ldap_pvt_sasl_mutex_dispose () #11 0x00000000004decab in ldap_pvt_sasl_mutex_dispose () #12 0x0000000000481b4a in ldap_pvt_sasl_mutex_dispose () #13 0x0000000000482127 in ldap_pvt_sasl_mutex_dispose () #14 0x000000000043be3e in ldap_pvt_sasl_mutex_dispose () #15 0x0000000000481bc2 in ldap_pvt_sasl_mutex_dispose () #16 0x0000000000482127 in ldap_pvt_sasl_mutex_dispose () #17 0x000000000043c83f in ldap_pvt_sasl_mutex_dispose () #18 0x00000000004205a5 in ldap_pvt_sasl_mutex_dispose () #19 0x0000000000420b7f in ldap_pvt_sasl_mutex_dispose () #20 0x00002ba9886e08a8 in ldap_int_thread_pool_wrapper () #21 0x0000003a546064a7 in start_thread () from /lib64/libpthread.so.0 #22 0x0000003a53ad3c2d in clone () from /lib64/libc.so.6 Thread 15 (Thread 0x42362940 (LWP 28532)): #0 0x0000003a5460ab99 in pthread_cond_wait@@GLIBC_2.3.2 () #1 0x00002ba988b6cdd2 in __db_pthread_mutex_lock () #2 0x00002ba988b6c426 in __db_tas_mutex_lock () #3 0x00002ba988bfcb49 in __lock_get_internal () #4 0x00002ba988bfd2c2 in __lock_get_pp () #5 0x00000000004af398 in ldap_pvt_sasl_mutex_dispose () #6 0x00000000004b04ca in ldap_pvt_sasl_mutex_dispose () #7 0x00000000004b4183 in ldap_pvt_sasl_mutex_dispose () #8 0x00000000004b802e in ldap_pvt_sasl_mutex_dispose () #9 0x0000000000481a7a in ldap_pvt_sasl_mutex_dispose () #10 0x0000000000482547 in ldap_pvt_sasl_mutex_dispose () #11 0x00000000004decab in ldap_pvt_sasl_mutex_dispose () #12 0x0000000000481b4a in ldap_pvt_sasl_mutex_dispose () #13 0x0000000000482127 in ldap_pvt_sasl_mutex_dispose () #14 0x000000000043be3e in ldap_pvt_sasl_mutex_dispose () #15 0x0000000000481bc2 in ldap_pvt_sasl_mutex_dispose () #16 0x0000000000482127 in ldap_pvt_sasl_mutex_dispose () #17 0x000000000043c83f in ldap_pvt_sasl_mutex_dispose () #18 0x00000000004205a5 in ldap_pvt_sasl_mutex_dispose () #19 0x0000000000420b7f in ldap_pvt_sasl_mutex_dispose () #20 0x00002ba9886e08a8 in ldap_int_thread_pool_wrapper () #21 0x0000003a546064a7 in start_thread () from /lib64/libpthread.so.0 #22 0x0000003a53ad3c2d in clone () from /lib64/libc.so.6 Thread 14 (Thread 0x42b63940 (LWP 28533)): #0 0x0000003a5460ab99 in pthread_cond_wait@@GLIBC_2.3.2 () #1 0x00002ba988b6cdd2 in __db_pthread_mutex_lock () #2 0x00002ba988b6c426 in __db_tas_mutex_lock () #3 0x00002ba988bfcb49 in __lock_get_internal () #4 0x00002ba988bfd2c2 in __lock_get_pp () #5 0x00000000004af398 in ldap_pvt_sasl_mutex_dispose () #6 0x00000000004b04ca in ldap_pvt_sasl_mutex_dispose () #7 0x00000000004b4183 in ldap_pvt_sasl_mutex_dispose () #8 0x00000000004b802e in ldap_pvt_sasl_mutex_dispose () #9 0x0000000000481a7a in ldap_pvt_sasl_mutex_dispose () #10 0x0000000000482547 in ldap_pvt_sasl_mutex_dispose () #11 0x00000000004decab in ldap_pvt_sasl_mutex_dispose () #12 0x0000000000481b4a in ldap_pvt_sasl_mutex_dispose () #13 0x0000000000482127 in ldap_pvt_sasl_mutex_dispose () #14 0x000000000043be3e in ldap_pvt_sasl_mutex_dispose () #15 0x0000000000481bc2 in ldap_pvt_sasl_mutex_dispose () #16 0x0000000000482127 in ldap_pvt_sasl_mutex_dispose () #17 0x000000000043c83f in ldap_pvt_sasl_mutex_dispose () #18 0x00000000004205a5 in ldap_pvt_sasl_mutex_dispose () #19 0x0000000000420b7f in ldap_pvt_sasl_mutex_dispose () #20 0x00002ba9886e08a8 in ldap_int_thread_pool_wrapper () #21 0x0000003a546064a7 in start_thread () from /lib64/libpthread.so.0 #22 0x0000003a53ad3c2d in clone () from /lib64/libc.so.6 Thread 13 (Thread 0x43364940 (LWP 28534)): #0 0x0000003a5460ab99 in pthread_cond_wait@@GLIBC_2.3.2 () #1 0x00002ba988b6cdd2 in __db_pthread_mutex_lock () #2 0x00002ba988b6c426 in __db_tas_mutex_lock () #3 0x00002ba988bfcb49 in __lock_get_internal () #4 0x00002ba988bfd2c2 in __lock_get_pp () #5 0x00000000004af398 in ldap_pvt_sasl_mutex_dispose () #6 0x00000000004b04ca in ldap_pvt_sasl_mutex_dispose () #7 0x00000000004b4183 in ldap_pvt_sasl_mutex_dispose () #8 0x00000000004b802e in ldap_pvt_sasl_mutex_dispose () #9 0x0000000000481a7a in ldap_pvt_sasl_mutex_dispose () #10 0x0000000000482547 in ldap_pvt_sasl_mutex_dispose () #11 0x00000000004decab in ldap_pvt_sasl_mutex_dispose () #12 0x0000000000481b4a in ldap_pvt_sasl_mutex_dispose () #13 0x0000000000482127 in ldap_pvt_sasl_mutex_dispose () #14 0x000000000043be3e in ldap_pvt_sasl_mutex_dispose () #15 0x0000000000481bc2 in ldap_pvt_sasl_mutex_dispose () #16 0x0000000000482127 in ldap_pvt_sasl_mutex_dispose () #17 0x000000000043c83f in ldap_pvt_sasl_mutex_dispose () #18 0x00000000004205a5 in ldap_pvt_sasl_mutex_dispose () #19 0x0000000000420b7f in ldap_pvt_sasl_mutex_dispose () #20 0x00002ba9886e08a8 in ldap_int_thread_pool_wrapper () #21 0x0000003a546064a7 in start_thread () from /lib64/libpthread.so.0 #22 0x0000003a53ad3c2d in clone () from /lib64/libc.so.6 Thread 12 (Thread 0x43b65940 (LWP 28535)): #0 0x0000003a5460ab99 in pthread_cond_wait@@GLIBC_2.3.2 () #1 0x00002ba988b6cdd2 in __db_pthread_mutex_lock () #2 0x00002ba988b6c426 in __db_tas_mutex_lock () #3 0x00002ba988bfcb49 in __lock_get_internal () #4 0x00002ba988bfd2c2 in __lock_get_pp () #5 0x00000000004af398 in ldap_pvt_sasl_mutex_dispose () #6 0x00000000004b04ca in ldap_pvt_sasl_mutex_dispose () #7 0x00000000004b4183 in ldap_pvt_sasl_mutex_dispose () #8 0x00000000004b802e in ldap_pvt_sasl_mutex_dispose () #9 0x0000000000481a7a in ldap_pvt_sasl_mutex_dispose () #10 0x0000000000482547 in ldap_pvt_sasl_mutex_dispose () #11 0x00000000004decab in ldap_pvt_sasl_mutex_dispose () #12 0x0000000000481b4a in ldap_pvt_sasl_mutex_dispose () #13 0x0000000000482127 in ldap_pvt_sasl_mutex_dispose () #14 0x000000000043be3e in ldap_pvt_sasl_mutex_dispose () #15 0x0000000000481bc2 in ldap_pvt_sasl_mutex_dispose () #16 0x0000000000482127 in ldap_pvt_sasl_mutex_dispose () #17 0x000000000043c83f in ldap_pvt_sasl_mutex_dispose () #18 0x00000000004205a5 in ldap_pvt_sasl_mutex_dispose () #19 0x0000000000420b7f in ldap_pvt_sasl_mutex_dispose () #20 0x00002ba9886e08a8 in ldap_int_thread_pool_wrapper () #21 0x0000003a546064a7 in start_thread () from /lib64/libpthread.so.0 #22 0x0000003a53ad3c2d in clone () from /lib64/libc.so.6 Thread 11 (Thread 0x44366940 (LWP 29755)): #0 0x0000003a5460ab99 in pthread_cond_wait@@GLIBC_2.3.2 () #1 0x00002ba988b6cdd2 in __db_pthread_mutex_lock () #2 0x00002ba988b6c426 in __db_tas_mutex_lock () #3 0x00002ba988bfcb49 in __lock_get_internal () #4 0x00002ba988bfd2c2 in __lock_get_pp () #5 0x00000000004af398 in ldap_pvt_sasl_mutex_dispose () #6 0x00000000004b04ca in ldap_pvt_sasl_mutex_dispose () #7 0x0000000000492a80 in ldap_pvt_sasl_mutex_dispose () #8 0x0000000000481bc2 in ldap_pvt_sasl_mutex_dispose () #9 0x0000000000482127 in ldap_pvt_sasl_mutex_dispose () #10 0x0000000000422736 in ldap_pvt_sasl_mutex_dispose () #11 0x0000000000481bc2 in ldap_pvt_sasl_mutex_dispose () #12 0x0000000000482127 in ldap_pvt_sasl_mutex_dispose () #13 0x0000000000422ec7 in ldap_pvt_sasl_mutex_dispose () #14 0x00000000004205a5 in ldap_pvt_sasl_mutex_dispose () #15 0x0000000000420b7f in ldap_pvt_sasl_mutex_dispose () #16 0x00002ba9886e08a8 in ldap_int_thread_pool_wrapper () #17 0x0000003a546064a7 in start_thread () from /lib64/libpthread.so.0 #18 0x0000003a53ad3c2d in clone () from /lib64/libc.so.6 Thread 10 (Thread 0x44b67940 (LWP 29756)): #0 0x0000003a5460ab99 in pthread_cond_wait@@GLIBC_2.3.2 () #1 0x00002ba988b6cdd2 in __db_pthread_mutex_lock () #2 0x00002ba988b6c426 in __db_tas_mutex_lock () #3 0x00002ba988bfcb49 in __lock_get_internal () #4 0x00002ba988bfd2c2 in __lock_get_pp () #5 0x00000000004af398 in ldap_pvt_sasl_mutex_dispose () #6 0x00000000004b04ca in ldap_pvt_sasl_mutex_dispose () #7 0x00000000004b4183 in ldap_pvt_sasl_mutex_dispose () #8 0x00000000004b802e in ldap_pvt_sasl_mutex_dispose () #9 0x0000000000481a7a in ldap_pvt_sasl_mutex_dispose () #10 0x0000000000482547 in ldap_pvt_sasl_mutex_dispose () #11 0x00000000004decab in ldap_pvt_sasl_mutex_dispose () #12 0x0000000000481b4a in ldap_pvt_sasl_mutex_dispose () #13 0x0000000000482127 in ldap_pvt_sasl_mutex_dispose () #14 0x000000000043be3e in ldap_pvt_sasl_mutex_dispose () #15 0x0000000000481bc2 in ldap_pvt_sasl_mutex_dispose () #16 0x0000000000482127 in ldap_pvt_sasl_mutex_dispose () #17 0x000000000043c83f in ldap_pvt_sasl_mutex_dispose () #18 0x00000000004205a5 in ldap_pvt_sasl_mutex_dispose () #19 0x0000000000420b7f in ldap_pvt_sasl_mutex_dispose () #20 0x00002ba9886e08a8 in ldap_int_thread_pool_wrapper () #21 0x0000003a546064a7 in start_thread () from /lib64/libpthread.so.0 #22 0x0000003a53ad3c2d in clone () from /lib64/libc.so.6 Thread 9 (Thread 0x45368940 (LWP 29757)): #0 0x0000003a5460ab99 in pthread_cond_wait@@GLIBC_2.3.2 () #1 0x00002ba988b6cdd2 in __db_pthread_mutex_lock () #2 0x00002ba988b6c426 in __db_tas_mutex_lock () #3 0x00002ba988bfcb49 in __lock_get_internal () #4 0x00002ba988bfd2c2 in __lock_get_pp () #5 0x00000000004af398 in ldap_pvt_sasl_mutex_dispose () #6 0x00000000004b04ca in ldap_pvt_sasl_mutex_dispose () #7 0x0000000000492a80 in ldap_pvt_sasl_mutex_dispose () #8 0x0000000000481bc2 in ldap_pvt_sasl_mutex_dispose () #9 0x0000000000482127 in ldap_pvt_sasl_mutex_dispose () #10 0x0000000000422736 in ldap_pvt_sasl_mutex_dispose () #11 0x0000000000481bc2 in ldap_pvt_sasl_mutex_dispose () #12 0x0000000000482127 in ldap_pvt_sasl_mutex_dispose () #13 0x0000000000422ec7 in ldap_pvt_sasl_mutex_dispose () #14 0x00000000004205a5 in ldap_pvt_sasl_mutex_dispose () #15 0x0000000000420b7f in ldap_pvt_sasl_mutex_dispose () #16 0x00002ba9886e08a8 in ldap_int_thread_pool_wrapper () #17 0x0000003a546064a7 in start_thread () from /lib64/libpthread.so.0 #18 0x0000003a53ad3c2d in clone () from /lib64/libc.so.6 Thread 8 (Thread 0x45b69940 (LWP 29758)): #0 0x0000003a5460ab99 in pthread_cond_wait@@GLIBC_2.3.2 () #1 0x00002ba988b6cdd2 in __db_pthread_mutex_lock () #2 0x00002ba988b6c426 in __db_tas_mutex_lock () #3 0x00002ba988bfcb49 in __lock_get_internal () #4 0x00002ba988bfd4ea in __lock_vec () #5 0x00002ba988bfe12b in __lock_vec_pp () #6 0x00000000004af514 in ldap_pvt_sasl_mutex_dispose () #7 0x00000000004af900 in ldap_pvt_sasl_mutex_dispose () #8 0x000000000048d323 in ldap_pvt_sasl_mutex_dispose () #9 0x0000000000481bc2 in ldap_pvt_sasl_mutex_dispose () #10 0x0000000000482127 in ldap_pvt_sasl_mutex_dispose () #11 0x00000000004de44d in ldap_pvt_sasl_mutex_dispose () #12 0x000000000042ee25 in ldap_pvt_sasl_mutex_dispose () #13 0x00000000004319c8 in ldap_pvt_sasl_mutex_dispose () #14 0x0000000000432749 in ldap_pvt_sasl_mutex_dispose () #15 0x00000000004ae91b in ldap_pvt_sasl_mutex_dispose () #16 0x0000000000481bc2 in ldap_pvt_sasl_mutex_dispose () #17 0x0000000000482127 in ldap_pvt_sasl_mutex_dispose () #18 0x000000000043be3e in ldap_pvt_sasl_mutex_dispose () #19 0x0000000000481bc2 in ldap_pvt_sasl_mutex_dispose () #20 0x0000000000482127 in ldap_pvt_sasl_mutex_dispose () #21 0x000000000043c83f in ldap_pvt_sasl_mutex_dispose () #22 0x00000000004205a5 in ldap_pvt_sasl_mutex_dispose () #23 0x0000000000420b7f in ldap_pvt_sasl_mutex_dispose () #24 0x00002ba9886e08a8 in ldap_int_thread_pool_wrapper () #25 0x0000003a546064a7 in start_thread () from /lib64/libpthread.so.0 #26 0x0000003a53ad3c2d in clone () from /lib64/libc.so.6 Thread 7 (Thread 0x4636a940 (LWP 29759)): #0 0x0000003a5460ab99 in pthread_cond_wait@@GLIBC_2.3.2 () #1 0x00002ba988b6cdd2 in __db_pthread_mutex_lock () #2 0x00002ba988b6c426 in __db_tas_mutex_lock () #3 0x00002ba988bfcb49 in __lock_get_internal () #4 0x00002ba988bfd2c2 in __lock_get_pp () #5 0x00000000004af398 in ldap_pvt_sasl_mutex_dispose () #6 0x00000000004b04ca in ldap_pvt_sasl_mutex_dispose () #7 0x00000000004b4183 in ldap_pvt_sasl_mutex_dispose () #8 0x00000000004b802e in ldap_pvt_sasl_mutex_dispose () #9 0x0000000000481a7a in ldap_pvt_sasl_mutex_dispose () #10 0x0000000000482547 in ldap_pvt_sasl_mutex_dispose () #11 0x00000000004decab in ldap_pvt_sasl_mutex_dispose () #12 0x0000000000481b4a in ldap_pvt_sasl_mutex_dispose () #13 0x0000000000482127 in ldap_pvt_sasl_mutex_dispose () #14 0x000000000043be3e in ldap_pvt_sasl_mutex_dispose () #15 0x0000000000481bc2 in ldap_pvt_sasl_mutex_dispose () #16 0x0000000000482127 in ldap_pvt_sasl_mutex_dispose () #17 0x000000000043c83f in ldap_pvt_sasl_mutex_dispose () #18 0x00000000004205a5 in ldap_pvt_sasl_mutex_dispose () #19 0x0000000000420b7f in ldap_pvt_sasl_mutex_dispose () #20 0x00002ba9886e08a8 in ldap_int_thread_pool_wrapper () #21 0x0000003a546064a7 in start_thread () from /lib64/libpthread.so.0 #22 0x0000003a53ad3c2d in clone () from /lib64/libc.so.6 Thread 6 (Thread 0x46b6b940 (LWP 29760)): #0 0x0000003a5460ab99 in pthread_cond_wait@@GLIBC_2.3.2 () #1 0x00002ba9886df7fb in ldap_pvt_thread_rmutex_lock () #2 0x00000000004d4c74 in ldap_pvt_sasl_mutex_dispose () #3 0x0000000000481b4a in ldap_pvt_sasl_mutex_dispose () #4 0x0000000000482127 in ldap_pvt_sasl_mutex_dispose () #5 0x00000000004de44d in ldap_pvt_sasl_mutex_dispose () #6 0x000000000042ee25 in ldap_pvt_sasl_mutex_dispose () #7 0x00000000004319c8 in ldap_pvt_sasl_mutex_dispose () #8 0x0000000000432749 in ldap_pvt_sasl_mutex_dispose () #9 0x00000000004ae91b in ldap_pvt_sasl_mutex_dispose () #10 0x0000000000481bc2 in ldap_pvt_sasl_mutex_dispose () #11 0x0000000000482127 in ldap_pvt_sasl_mutex_dispose () #12 0x000000000043be3e in ldap_pvt_sasl_mutex_dispose () #13 0x0000000000481bc2 in ldap_pvt_sasl_mutex_dispose () #14 0x0000000000482127 in ldap_pvt_sasl_mutex_dispose () #15 0x000000000043c83f in ldap_pvt_sasl_mutex_dispose () #16 0x00000000004205a5 in ldap_pvt_sasl_mutex_dispose () #17 0x0000000000420b7f in ldap_pvt_sasl_mutex_dispose () #18 0x00002ba9886e08a8 in ldap_int_thread_pool_wrapper () #19 0x0000003a546064a7 in start_thread () from /lib64/libpthread.so.0 #20 0x0000003a53ad3c2d in clone () from /lib64/libc.so.6 Thread 5 (Thread 0x4736c940 (LWP 29761)): #0 0x0000003a5460ab99 in pthread_cond_wait@@GLIBC_2.3.2 () #1 0x00002ba988b6cdd2 in __db_pthread_mutex_lock () #2 0x00002ba988b6c426 in __db_tas_mutex_lock () #3 0x00002ba988bfcb49 in __lock_get_internal () #4 0x00002ba988bfd2c2 in __lock_get_pp () #5 0x00000000004af398 in ldap_pvt_sasl_mutex_dispose () #6 0x00000000004b04ca in ldap_pvt_sasl_mutex_dispose () #7 0x00000000004b4183 in ldap_pvt_sasl_mutex_dispose () #8 0x00000000004b802e in ldap_pvt_sasl_mutex_dispose () #9 0x0000000000481a7a in ldap_pvt_sasl_mutex_dispose () #10 0x0000000000482547 in ldap_pvt_sasl_mutex_dispose () #11 0x00000000004decab in ldap_pvt_sasl_mutex_dispose () #12 0x0000000000481b4a in ldap_pvt_sasl_mutex_dispose () #13 0x0000000000482127 in ldap_pvt_sasl_mutex_dispose () #14 0x000000000043be3e in ldap_pvt_sasl_mutex_dispose () #15 0x0000000000481bc2 in ldap_pvt_sasl_mutex_dispose () #16 0x0000000000482127 in ldap_pvt_sasl_mutex_dispose () #17 0x000000000043c83f in ldap_pvt_sasl_mutex_dispose () #18 0x00000000004205a5 in ldap_pvt_sasl_mutex_dispose () #19 0x0000000000420b7f in ldap_pvt_sasl_mutex_dispose () #20 0x00002ba9886e08a8 in ldap_int_thread_pool_wrapper () #21 0x0000003a546064a7 in start_thread () from /lib64/libpthread.so.0 #22 0x0000003a53ad3c2d in clone () from /lib64/libc.so.6 Thread 4 (Thread 0x47b6d940 (LWP 29762)): #0 0x0000003a5460ab99 in pthread_cond_wait@@GLIBC_2.3.2 () #1 0x00002ba988b6cdd2 in __db_pthread_mutex_lock () #2 0x00002ba988b6c426 in __db_tas_mutex_lock () #3 0x00002ba988bfcb49 in __lock_get_internal () #4 0x00002ba988bfd2c2 in __lock_get_pp () #5 0x00000000004af398 in ldap_pvt_sasl_mutex_dispose () #6 0x00000000004b04ca in ldap_pvt_sasl_mutex_dispose () #7 0x00000000004b4183 in ldap_pvt_sasl_mutex_dispose () #8 0x00000000004b802e in ldap_pvt_sasl_mutex_dispose () #9 0x0000000000481a7a in ldap_pvt_sasl_mutex_dispose () #10 0x0000000000482547 in ldap_pvt_sasl_mutex_dispose () #11 0x00000000004decab in ldap_pvt_sasl_mutex_dispose () #12 0x0000000000481b4a in ldap_pvt_sasl_mutex_dispose () #13 0x0000000000482127 in ldap_pvt_sasl_mutex_dispose () #14 0x000000000043be3e in ldap_pvt_sasl_mutex_dispose () #15 0x0000000000481bc2 in ldap_pvt_sasl_mutex_dispose () #16 0x0000000000482127 in ldap_pvt_sasl_mutex_dispose () #17 0x000000000043c83f in ldap_pvt_sasl_mutex_dispose () #18 0x00000000004205a5 in ldap_pvt_sasl_mutex_dispose () #19 0x0000000000420b7f in ldap_pvt_sasl_mutex_dispose () #20 0x00002ba9886e08a8 in ldap_int_thread_pool_wrapper () #21 0x0000003a546064a7 in start_thread () from /lib64/libpthread.so.0 #22 0x0000003a53ad3c2d in clone () from /lib64/libc.so.6 Thread 3 (Thread 0x4836e940 (LWP 29763)): #0 0x0000003a5460ab99 in pthread_cond_wait@@GLIBC_2.3.2 () #1 0x00002ba988b6cdd2 in __db_pthread_mutex_lock () #2 0x00002ba988b6c426 in __db_tas_mutex_lock () #3 0x00002ba988bfcb49 in __lock_get_internal () #4 0x00002ba988bfd2c2 in __lock_get_pp () #5 0x00000000004af398 in ldap_pvt_sasl_mutex_dispose () #6 0x00000000004b04ca in ldap_pvt_sasl_mutex_dispose () #7 0x0000000000492a80 in ldap_pvt_sasl_mutex_dispose () #8 0x0000000000481bc2 in ldap_pvt_sasl_mutex_dispose () #9 0x0000000000482127 in ldap_pvt_sasl_mutex_dispose () #10 0x00000000004758a4 in ldap_pvt_sasl_mutex_dispose () #11 0x000000000047ce54 in ldap_pvt_sasl_mutex_dispose () #12 0x00000000004214d3 in ldap_pvt_sasl_mutex_dispose () #13 0x00002ba9886e08a8 in ldap_int_thread_pool_wrapper () #14 0x0000003a546064a7 in start_thread () from /lib64/libpthread.so.0 #15 0x0000003a53ad3c2d in clone () from /lib64/libc.so.6 Thread 2 (Thread 0x48b6f940 (LWP 29764)): #0 0x0000003a5460ab99 in pthread_cond_wait@@GLIBC_2.3.2 () #1 0x00002ba988b6cdd2 in __db_pthread_mutex_lock () #2 0x00002ba988b6c426 in __db_tas_mutex_lock () #3 0x00002ba988bfcb49 in __lock_get_internal () #4 0x00002ba988bfd2c2 in __lock_get_pp () #5 0x00000000004af398 in ldap_pvt_sasl_mutex_dispose () #6 0x00000000004b04ca in ldap_pvt_sasl_mutex_dispose () #7 0x00000000004b4183 in ldap_pvt_sasl_mutex_dispose () #8 0x00000000004b802e in ldap_pvt_sasl_mutex_dispose () #9 0x0000000000481a7a in ldap_pvt_sasl_mutex_dispose () #10 0x0000000000482547 in ldap_pvt_sasl_mutex_dispose () #11 0x00000000004decab in ldap_pvt_sasl_mutex_dispose () #12 0x0000000000481b4a in ldap_pvt_sasl_mutex_dispose () #13 0x0000000000482127 in ldap_pvt_sasl_mutex_dispose () #14 0x000000000043be3e in ldap_pvt_sasl_mutex_dispose () #15 0x0000000000481bc2 in ldap_pvt_sasl_mutex_dispose () #16 0x0000000000482127 in ldap_pvt_sasl_mutex_dispose () #17 0x000000000043c83f in ldap_pvt_sasl_mutex_dispose () #18 0x00000000004205a5 in ldap_pvt_sasl_mutex_dispose () #19 0x0000000000420b7f in ldap_pvt_sasl_mutex_dispose () #20 0x00002ba9886e08a8 in ldap_int_thread_pool_wrapper () #21 0x0000003a546064a7 in start_thread () from /lib64/libpthread.so.0 #22 0x0000003a53ad3c2d in clone () from /lib64/libc.so.6 Thread 1 (Thread 0x2ba988eb9bd0 (LWP 28525)): #0 0x0000003a546077e5 in pthread_join () from /lib64/libpthread.so.0 #1 0x000000000041b3e2 in ldap_pvt_sasl_mutex_dispose () #2 0x000000000040960e in ldap_pvt_sasl_mutex_dispose () #3 0x0000003a53a1d994 in __libc_start_main () from /lib64/libc.so.6 #4 0x00000000004080d9 in ldap_pvt_sasl_mutex_dispose () #5 0x00007ffff6e2df58 in ?? () #6 0x0000000000000000 in ?? ()
I have 2 different suffixes, but didn't notice any obvious problem with the database or file system (space left, corruption, bad file owner and so on). For example, the db_stat -c gives the following result on the main database:
579 Last allocated locker ID 0x7fffffff Current maximum unused locker ID 9 Number of lock modes 10000 Maximum number of locks possible 10000 Maximum number of lockers possible 10000 Maximum number of lock objects possible 1 Number of lock object partitions 951 Number of current locks 1686 Maximum number of locks at any one time 16 Maximum number of locks in any one bucket 0 Maximum number of locks stolen by for an empty partition 0 Maximum number of locks stolen for any one partition 114 Number of current lockers 130 Maximum number of lockers at any one time 933 Number of current lock objects 1586 Maximum number of lock objects at any one time 3 Maximum number of lock objects in any one bucket 0 Maximum number of objects stolen by for an empty partition 0 Maximum number of objects stolen for any one partition 125M Total number of locks requested (125419838) 125M Total number of locks released (125418879) 0 Total number of locks upgraded 138 Total number of locks downgraded 15 Lock requests not available due to conflicts, for which we waited 0 Lock requests not available due to conflicts, for which we did not wait 0 Number of deadlocks 0 Lock timeout value 0 Number of locks that have timed out 0 Transaction timeout value 0 Number of transactions that have timed out 8MB 464KB The size of the lock region 0 The number of partition locks that required waiting (0%) 0 The maximum number of times any partition lock was waited for (0%) 0 The number of object queue operations that required waiting (0%) 0 The number of locker allocations that required waiting (0%) 45 The number of region locks that required waiting (0%) 3 Maximum hash bucket length
I've also backed up the content of the /proc/28525 directory for further analysis. What else could I do ?
--On Tuesday, July 12, 2011 11:59 AM +0200 Cyril GROSJEAN cgrosjean@janua.fr wrote:
# 5 0x00000000004af398 in ldap_pvt_sasl_mutex_dispose () # 6 0x00000000004b04ca in ldap_pvt_sasl_mutex_dispose () # 7 0x00000000004b4183 in ldap_pvt_sasl_mutex_dispose () # 8 0x00000000004b802e in ldap_pvt_sasl_mutex_dispose () # 9 0x0000000000481a7a in ldap_pvt_sasl_mutex_dispose () # 10 0x0000000000482547 in ldap_pvt_sasl_mutex_dispose () # 11 0x00000000004decab in ldap_pvt_sasl_mutex_dispose () # 12 0x0000000000481b4a in ldap_pvt_sasl_mutex_dispose () # 13 0x0000000000482127 in ldap_pvt_sasl_mutex_dispose () # 14 0x000000000043be3e in ldap_pvt_sasl_mutex_dispose () # 15 0x0000000000481bc2 in ldap_pvt_sasl_mutex_dispose () # 16 0x0000000000482127 in ldap_pvt_sasl_mutex_dispose () # 17 0x000000000043c83f in ldap_pvt_sasl_mutex_dispose () # 18 0x00000000004205a5 in ldap_pvt_sasl_mutex_dispose () # 19 0x0000000000420b7f in ldap_pvt_sasl_mutex_dispose () # 20 0x00002ba9886e08a8 in ldap_int_thread_pool_wrapper ()
This looks like a bug with cyrus-sasl to me. Are you doing pass through authentication?
--Quanah
--
Quanah Gibson-Mount Sr. Member of Technical Staff Zimbra, Inc A Division of VMware, Inc. -------------------- Zimbra :: the leader in open source messaging and collaboration
In case it could benefit to anyone, the freeze seems to come when there're "lots of" "deferring operation: binding" messages in the slapd logfile. So, we investigated the corresponding client application and found the way it dealt with LDAP connections could be a problem with OpenLDAP (it was not with Sun DS 5.2 !). So, we're now in the process of rewriting the client side LDAP connections management with a pool, so that at least we don't see anymore "deferring operation: binding" messages.
By the way, are there well known best practices and pointers to write clean (from a connection management point of view) LDAP clients, or at least best practices or "known bad practices" with regards to the way OpenLDAP deals with connections ?
On Tuesday, 12 July 2011 11:59:52 Cyril GROSJEAN wrote:
I randomly notice my OpenLDAP server freezes, and I can't udnerstand why. I have a few LDAP clients (ldapsearch, a legacy Java app. and ApacheDirectoryStudio), running from different systems, either locally on the OpenLDAP server, or on another OpenLDAP server, or on a remote workstation, and none manages to get an answer from OpenLDAP. The connection is established but each client gets stuck waiting for any result.
[...]
Jul 12 10:20:05 dev-ldap1 slapd[28525]: connection_input: conn=3377 deferring operation: binding
This is the code (at least in 2.4.26) that generates the message:
/* Don't process requests when the conn is in the middle of a * Bind, or if it's closing. Also, don't let any single conn * use up all the available threads, and don't execute if we're * currently blocked on output. And don't execute if there are * already pending ops, let them go first. Abandon operations * get exceptions to some, but not all, cases. */ switch( tag ){ default: /* Abandon and Unbind are exempt from these checks */ if (conn->c_conn_state == SLAP_C_CLOSING) { defer = "closing"; break; } else if (conn->c_writewaiter) { defer = "awaiting write"; break; } else if (conn->c_n_ops_pending) { defer = "pending operations"; break; } /* FALLTHRU */ case LDAP_REQ_ABANDON: /* Unbind is exempt from these checks */ if (conn->c_n_ops_executing >= connection_pool_max/2) { defer = "too many executing"; break; } else if (conn->c_conn_state == SLAP_C_BINDING) { defer = "binding"; break; } /* FALLTHRU */ case LDAP_REQ_UNBIND: break; }
if( defer ) { int max = conn->c_dn.bv_len ? slap_conn_max_pending_auth : slap_conn_max_pending;
Debug( LDAP_DEBUG_ANY, "connection_input: conn=%lu deferring operation: %s\n", conn->c_connid, defer, 0 ); conn->c_n_ops_pending++; LDAP_STAILQ_INSERT_TAIL( &conn->c_pending_ops, op, o_next ); rc = ( conn->c_n_ops_pending > max ) ? -1 : 0;
} else {
... carry on and handle the op.
As far as I understand, the intention is to (among others) ignore operations from connections where a BIND operation is still pending. However, some of the comments now appear to be a bit misplaced (e.g. Unbind comment vs LDAP_REQ_ABANDON). Also, the code appears (to me, not being very familiar with it, and quite rusty at C) to not be doing the right thing. The portion generating the "deferring operation: binding" message appears to be when an abandon operation is received on a connection that has a pending BIND operation. Shouldn't an abandon be allowed for a BIND? Or, am I reading it wrong? Also, it looks as if the "too many executing" is also only applicable to abandon?
Shouldn't the LDAP_REQ_ABANDON case be breaking without setting 'defer'?
Shouldn't the 'conn->c_conn_state == SLAP_C_BINDING' and 'conn-
c_n_ops_executing >= connection_pool_max/2' conditions be handled by the
default case as well?
We have been running into both the "deferring: binding" and "deferring: too many executing" messages, but I hadn't had time to trace what the LDAP client software was doing, but now I wonder if maybe it was sending abandon requests when some operations weren't returning in time (after > 18000 successful operations on a connection. I think its behaviour regarding its use of LDAP connections may be wrong, but I would prefer to be able to prove that its behaviour is wrong to the vendor without other log entries that show its correct behaviour being handled incorrectly.
Also, the hard-coded 'one connection may not use more pending operations than half the number of threads' rule seems a bit arbitrary. Could we get a knob to twiddle this?
Regards, Buchan
openldap-technical@openldap.org