The patch in ITS#6872 didn't fix the issue.
My first thought was to enable all the logging, but there's *so much data*and I don't know what's normal and what isn't. I captured the (netstat) connection information on all four hosts. Several of the connections are stuck in FIN_WAIT1 which normally is a quick, transitional state:
host1$ netstat -an | fgrep :1389 Proto Recv-Q Send-Q Local Address Foreign Address State tcp 0 0 10.1.1.1:1389 0.0.0.0:* LISTEN tcp 65115 0 10.1.1.1:19284 --> 10.1.1.4:1389 ESTABLISHED tcp 0 0 10.1.1.1:1389 <-- 10.1.1.4:36991 ESTABLISHED tcp 73458 0 10.1.1.1:19286 --> 10.1.1.3:1389 ESTABLISHED tcp 0 0 10.1.1.1:1389 <-- 10.1.1.3:38085 ESTABLISHED tcp 73112 0 10.1.1.1:19263 --> 10.1.1.2:1389 ESTABLISHED tcp 0 0 10.1.1.1:1389 <-- 10.1.1.2:41374 ESTABLISHED
host2$ netstat -an | fgrep :1389 Proto Recv-Q Send-Q Local Address Foreign Address State tcp 0 0 10.1.1.2:1389 0.0.0.0:* LISTEN tcp 0 11537 10.1.1.2:1389 <-- 10.1.1.1:19263 FIN_WAIT1 tcp 0 0 10.1.1.2:1389 <-- 10.1.1.4:36992 ESTABLISHED tcp 0 0 10.1.1.2:1389 <-- 10.1.1.3:38086 ESTABLISHED tcp 0 0 10.1.1.2:41373 --> 10.1.1.3:1389 ESTABLISHED tcp 0 0 10.1.1.2:41375 --> 10.1.1.4:1389 ESTABLISHED tcp 0 0 10.1.1.2:41374 --> 10.1.1.1:1389 ESTABLISHED
host3$ netstat -an | fgrep :1389 Proto Recv-Q Send-Q Local Address Foreign Address State tcp 0 0 10.1.1.3:1389 0.0.0.0:* LISTEN tcp 0 11521 10.1.1.3:1389 <-- 10.1.1.1:19286 FIN_WAIT1 tcp 0 0 10.1.1.3:38087 --> 10.1.1.4:1389 ESTABLISHED tcp 0 0 10.1.1.3:38085 --> 10.1.1.1:1389 ESTABLISHED tcp 0 11505 10.1.1.3:1389 <-- 10.1.1.4:37000 FIN_WAIT1 tcp 0 0 10.1.1.3:38086 --> 10.1.1.2:1389 ESTABLISHED tcp 0 0 10.1.1.3:1389 <-- 10.1.1.2:41373 ESTABLISHED
host4$ netstat -an | fgrep :1389 Proto Recv-Q Send-Q Local Address Foreign Address State tcp 0 0 10.1.1.4:1389 0.0.0.0:* LISTEN tcp 0 14281 10.1.1.4:1389 <-- 10.1.1.1:19284 FIN_WAIT1 tcp 73567 0 10.1.1.4:37000 --> 10.1.1.3:1389 ESTABLISHED tcp 0 0 10.1.1.4:1389 <-- 10.1.1.3:38087 ESTABLISHED tcp 17534 0 10.1.1.4:36991 --> 10.1.1.1:1389 ESTABLISHED tcp 0 0 10.1.1.4:1389 <-- 10.1.1.2:41375 ESTABLISHED tcp 0 0 10.1.1.4:36992 --> 10.1.1.2:1389 ESTABLISHED
I also captured a *pstack* trace on each of the four slapds. But again, I'm not sure what's *normal*:
host1$ Thread 17 (Thread 1082132832 (LWP 25922)): #0 0x0000003a340ca15c in epoll_wait () from /lib64/tls/libc.so.6 #1 0x0000000000437a52 in slapd_daemon_destroy () #2 0x0000003a34706317 in start_thread () from /lib64/tls/libpthread.so.0 #3 0x0000003a340c9d83 in clone () from /lib64/tls/libc.so.6 Thread 16 (Thread 1090525536 (LWP 25923)): #0 0x0000003a34708d1a in pthread_cond_wait@@GLIBC_2.3.2 () #1 0x00000000004e9150 in ldap_pvt_thread_cond_wait () #2 0x00000000004e7ca2 in ldap_pvt_thread_pool_destroy () #3 0x0000003a34706317 in start_thread () from /lib64/tls/libpthread.so.0 #4 0x0000003a340c9d83 in clone () from /lib64/tls/libc.so.6 Thread 15 (Thread 1098918240 (LWP 25924)): #0 0x0000003a34708d1a in pthread_cond_wait@@GLIBC_2.3.2 () #1 0x00000000004e9150 in ldap_pvt_thread_cond_wait () #2 0x00000000004e7ca2 in ldap_pvt_thread_pool_destroy () #3 0x0000003a34706317 in start_thread () from /lib64/tls/libpthread.so.0 #4 0x0000003a340c9d83 in clone () from /lib64/tls/libc.so.6 Thread 14 (Thread 1107310944 (LWP 25925)): #0 0x0000003a34708d1a in pthread_cond_wait@@GLIBC_2.3.2 () #1 0x00000000004e9150 in ldap_pvt_thread_cond_wait () #2 0x00000000004e7ca2 in ldap_pvt_thread_pool_destroy () #3 0x0000003a34706317 in start_thread () from /lib64/tls/libpthread.so.0 #4 0x0000003a340c9d83 in clone () from /lib64/tls/libc.so.6 Thread 13 (Thread 1115703648 (LWP 25926)): #0 0x0000003a34708d1a in pthread_cond_wait@@GLIBC_2.3.2 () #1 0x00000000004e9150 in ldap_pvt_thread_cond_wait () #2 0x00000000004e7ca2 in ldap_pvt_thread_pool_destroy () #3 0x0000003a34706317 in start_thread () from /lib64/tls/libpthread.so.0 #4 0x0000003a340c9d83 in clone () from /lib64/tls/libc.so.6 Thread 12 (Thread 1124096352 (LWP 26071)): #0 0x0000003a34708d1a in pthread_cond_wait@@GLIBC_2.3.2 () #1 0x00000000004e9150 in ldap_pvt_thread_cond_wait () #2 0x00000000004e7ca2 in ldap_pvt_thread_pool_destroy () #3 0x0000003a34706317 in start_thread () from /lib64/tls/libpthread.so.0 #4 0x0000003a340c9d83 in clone () from /lib64/tls/libc.so.6 Thread 11 (Thread 1132489056 (LWP 26072)): #0 0x0000003a340b0719 in sched_yield () from /lib64/tls/libc.so.6 #1 0x00000000004e90d0 in ldap_pvt_thread_yield () #2 0x0000002a9630d11e in syncprov_op_search () #3 0x00000000004c3843 in overlay_op_walk () #4 0x00000000004c3a9f in overlay_op_walk () #5 0x00000000004c3b7a in overlay_op_walk () #6 0x000000000043ef62 in fe_op_search () #7 0x000000000043e8c2 in do_search () #8 0x000000000043b6d5 in connection_done () #9 0x000000000043bc89 in connection_client_stop () #10 0x00000000004e7d21 in ldap_pvt_thread_pool_destroy () #11 0x0000003a34706317 in start_thread () from /lib64/tls/libpthread.so.0 #12 0x0000003a340c9d83 in clone () from /lib64/tls/libc.so.6 Thread 10 (Thread 1140881760 (LWP 26073)): #0 0x0000003a34708d1a in pthread_cond_wait@@GLIBC_2.3.2 () #1 0x0000002a956a644b in __db_pthread_mutex_lock () #2 0x0000002a956a5b11 in __db_tas_mutex_lock_int () #3 0x0000002a956a5887 in __db_tas_mutex_lock () #4 0x0000002a95776ed2 in __lock_get_internal () #5 0x0000002a9577525d in __lock_get () #6 0x0000002a957bd9cf in __db_lget () #7 0x0000002a956b67bc in __bamc_writelock () #8 0x0000002a957a5b71 in __dbc_idel () #9 0x0000002a957a5ace in __dbc_del () #10 0x0000002a957b8039 in __dbc_del_pp () #11 0x0000002a961dc91e in hdb_idl_delete_key () #12 0x0000002a961d1d4b in hdb_key_change () #13 0x0000002a961d0d1b in indexer () #14 0x0000002a961d1159 in index_at_values () #15 0x0000002a961d12d2 in hdb_index_values () #16 0x0000002a961d173a in hdb_index_entry () #17 0x0000002a961c5888 in hdb_delete () #18 0x00000000004c38d7 in overlay_op_walk () #19 0x00000000004c3a9f in overlay_op_walk () #20 0x00000000004c3c2e in overlay_op_walk () #21 0x00000000004b5b1a in cancel_extop () #22 0x00000000004af92a in cancel_extop ()
host2$ Thread 9 (Thread 1082132832 (LWP 12700)): #0 0x0000003530bca15c in epoll_wait () from /lib64/tls/libc.so.6 #1 0x0000000000437a52 in slapd_daemon_destroy () #2 0x0000003531206317 in start_thread () from /lib64/tls/libpthread.so.0 #3 0x0000003530bc9d83 in clone () from /lib64/tls/libc.so.6 Thread 8 (Thread 1090525536 (LWP 12701)): #0 0x0000003531208d1a in pthread_cond_wait@@GLIBC_2.3.2 () #1 0x00000000004e9150 in ldap_pvt_thread_cond_wait () #2 0x00000000004e7ca2 in ldap_pvt_thread_pool_destroy () #3 0x0000003531206317 in start_thread () from /lib64/tls/libpthread.so.0 #4 0x0000003530bc9d83 in clone () from /lib64/tls/libc.so.6 Thread 7 (Thread 1098918240 (LWP 12702)): #0 0x0000003531208d1a in pthread_cond_wait@@GLIBC_2.3.2 () #1 0x00000000004e9150 in ldap_pvt_thread_cond_wait () #2 0x00000000004e7ca2 in ldap_pvt_thread_pool_destroy () #3 0x0000003531206317 in start_thread () from /lib64/tls/libpthread.so.0 #4 0x0000003530bc9d83 in clone () from /lib64/tls/libc.so.6 Thread 6 (Thread 1107310944 (LWP 12703)): #0 0x0000003531208d1a in pthread_cond_wait@@GLIBC_2.3.2 () #1 0x00000000004e9150 in ldap_pvt_thread_cond_wait () #2 0x00000000004e7ca2 in ldap_pvt_thread_pool_destroy () #3 0x0000003531206317 in start_thread () from /lib64/tls/libpthread.so.0 #4 0x0000003530bc9d83 in clone () from /lib64/tls/libc.so.6 Thread 5 (Thread 1115703648 (LWP 12704)): #0 0x0000003531208d1a in pthread_cond_wait@@GLIBC_2.3.2 () #1 0x00000000004e9150 in ldap_pvt_thread_cond_wait () #2 0x00000000004e7ca2 in ldap_pvt_thread_pool_destroy () #3 0x0000003531206317 in start_thread () from /lib64/tls/libpthread.so.0 #4 0x0000003530bc9d83 in clone () from /lib64/tls/libc.so.6 Thread 4 (Thread 1124096352 (LWP 13049)): #0 0x0000003531208d1a in pthread_cond_wait@@GLIBC_2.3.2 () #1 0x00000000004e9150 in ldap_pvt_thread_cond_wait () #2 0x00000000004e7ca2 in ldap_pvt_thread_pool_destroy () #3 0x0000003531206317 in start_thread () from /lib64/tls/libpthread.so.0 #4 0x0000003530bc9d83 in clone () from /lib64/tls/libc.so.6 Thread 3 (Thread 1132489056 (LWP 13050)): #0 0x0000003531208d1a in pthread_cond_wait@@GLIBC_2.3.2 () #1 0x00000000004e9150 in ldap_pvt_thread_cond_wait () #2 0x00000000004e7ca2 in ldap_pvt_thread_pool_destroy () #3 0x0000003531206317 in start_thread () from /lib64/tls/libpthread.so.0 #4 0x0000003530bc9d83 in clone () from /lib64/tls/libc.so.6 Thread 2 (Thread 1140881760 (LWP 13059)): #0 0x0000003531208d1a in pthread_cond_wait@@GLIBC_2.3.2 () #1 0x00000000004e9150 in ldap_pvt_thread_cond_wait () #2 0x00000000004e7ca2 in ldap_pvt_thread_pool_destroy () #3 0x0000003531206317 in start_thread () from /lib64/tls/libpthread.so.0 #4 0x0000003530bc9d83 in clone () from /lib64/tls/libc.so.6 Thread 1 (Thread 182903646528 (LWP 12693)): #0 0x000000353120732b in pthread_join () from /lib64/tls/libpthread.so.0 #1 0x00000000004e90a8 in ldap_pvt_thread_join () #2 0x0000000000438bd8 in slapd_daemon () #3 0x000000000041932a in main ()
host3$ Thread 9 (Thread 1082132832 (LWP 20629)): #0 0x00000035c64ca15c in epoll_wait () from /lib64/tls/libc.so.6 #1 0x0000000000437a52 in slapd_daemon_destroy () #2 0x00000035c6d06317 in start_thread () from /lib64/tls/libpthread.so.0 #3 0x00000035c64c9d83 in clone () from /lib64/tls/libc.so.6 Thread 8 (Thread 1090525536 (LWP 20630)): #0 0x00000035c6d08d1a in pthread_cond_wait@@GLIBC_2.3.2 () #1 0x00000000004e9150 in ldap_pvt_thread_cond_wait () #2 0x00000000004e7ca2 in ldap_pvt_thread_pool_destroy () #3 0x00000035c6d06317 in start_thread () from /lib64/tls/libpthread.so.0 #4 0x00000035c64c9d83 in clone () from /lib64/tls/libc.so.6 Thread 7 (Thread 1098918240 (LWP 20631)): #0 0x00000035c6d08d1a in pthread_cond_wait@@GLIBC_2.3.2 () #1 0x00000000004e9150 in ldap_pvt_thread_cond_wait () #2 0x00000000004e7ca2 in ldap_pvt_thread_pool_destroy () #3 0x00000035c6d06317 in start_thread () from /lib64/tls/libpthread.so.0 #4 0x00000035c64c9d83 in clone () from /lib64/tls/libc.so.6 Thread 6 (Thread 1107310944 (LWP 20632)): #0 0x00000035c6d08d1a in pthread_cond_wait@@GLIBC_2.3.2 () #1 0x00000000004e9150 in ldap_pvt_thread_cond_wait () #2 0x00000000004e7ca2 in ldap_pvt_thread_pool_destroy () #3 0x00000035c6d06317 in start_thread () from /lib64/tls/libpthread.so.0 #4 0x00000035c64c9d83 in clone () from /lib64/tls/libc.so.6 Thread 5 (Thread 1115703648 (LWP 20633)): #0 0x00000035c6d08d1a in pthread_cond_wait@@GLIBC_2.3.2 () #1 0x00000000004e9150 in ldap_pvt_thread_cond_wait () #2 0x00000000004e7ca2 in ldap_pvt_thread_pool_destroy () #3 0x00000035c6d06317 in start_thread () from /lib64/tls/libpthread.so.0 #4 0x00000035c64c9d83 in clone () from /lib64/tls/libc.so.6 Thread 4 (Thread 1124096352 (LWP 20983)): #0 0x00000035c6d08d1a in pthread_cond_wait@@GLIBC_2.3.2 () #1 0x00000000004e9150 in ldap_pvt_thread_cond_wait () #2 0x00000000004e7ca2 in ldap_pvt_thread_pool_destroy () #3 0x00000035c6d06317 in start_thread () from /lib64/tls/libpthread.so.0 #4 0x00000035c64c9d83 in clone () from /lib64/tls/libc.so.6 Thread 3 (Thread 1132489056 (LWP 20984)): #0 0x00000035c6d08d1a in pthread_cond_wait@@GLIBC_2.3.2 () #1 0x00000000004e9150 in ldap_pvt_thread_cond_wait () #2 0x00000000004e7ca2 in ldap_pvt_thread_pool_destroy () #3 0x00000035c6d06317 in start_thread () from /lib64/tls/libpthread.so.0 #4 0x00000035c64c9d83 in clone () from /lib64/tls/libc.so.6 Thread 2 (Thread 1140881760 (LWP 21005)): #0 0x00000035c6d08d1a in pthread_cond_wait@@GLIBC_2.3.2 () #1 0x00000000004e9150 in ldap_pvt_thread_cond_wait () #2 0x00000000004e7ca2 in ldap_pvt_thread_pool_destroy () #3 0x00000035c6d06317 in start_thread () from /lib64/tls/libpthread.so.0 #4 0x00000035c64c9d83 in clone () from /lib64/tls/libc.so.6 Thread 1 (Thread 182903654720 (LWP 20628)): #0 0x00000035c6d0732b in pthread_join () from /lib64/tls/libpthread.so.0 #1 0x00000000004e90a8 in ldap_pvt_thread_join () #2 0x0000000000438bd8 in slapd_daemon () #3 0x000000000041932a in main ()
host4$ Thread 12 (Thread 1082132832 (LWP 26819)): #0 0x00000030d86ca15c in epoll_wait () from /lib64/tls/libc.so.6 #1 0x0000000000437a52 in slapd_daemon_destroy () #2 0x00000030d8d06317 in start_thread () from /lib64/tls/libpthread.so.0 #3 0x00000030d86c9d83 in clone () from /lib64/tls/libc.so.6 Thread 11 (Thread 1090525536 (LWP 26820)): #0 0x00000030d86b0719 in sched_yield () from /lib64/tls/libc.so.6 #1 0x00000000004e90d0 in ldap_pvt_thread_yield () #2 0x00000000004af4ab in cancel_extop () #3 0x00000000004b1457 in cancel_extop () #4 0x000000000043bca3 in connection_client_stop () #5 0x00000000004e7d21 in ldap_pvt_thread_pool_destroy () #6 0x00000030d8d06317 in start_thread () from /lib64/tls/libpthread.so.0 #7 0x00000030d86c9d83 in clone () from /lib64/tls/libc.so.6 Thread 10 (Thread 1098918240 (LWP 26821)): #0 0x00000030d8d08d1a in pthread_cond_wait@@GLIBC_2.3.2 () #1 0x0000002a956a744b in __db_pthread_mutex_lock () #2 0x0000002a956a6b11 in __db_tas_mutex_lock_int () #3 0x0000002a956a6887 in __db_tas_mutex_lock () #4 0x0000002a95777ed2 in __lock_get_internal () #5 0x0000002a957750ae in __lock_vec () #6 0x0000002a95774e53 in __lock_vec_api () #7 0x0000002a95774da3 in __lock_vec_pp () #8 0x0000002a961df6f0 in hdb_cache_entry_db_relock () #9 0x0000002a961e169e in hdb_cache_modify () #10 0x0000002a961c95ac in hdb_modify () #11 0x0000002a9630a959 in syncprov_checkpoint () #12 0x0000002a9630c241 in syncprov_op_response () #13 0x00000000004500f6 in rs_entry2modifiable () #14 0x00000000004502f5 in rs_entry2modifiable () #15 0x000000000045112e in slap_send_ldap_result () #16 0x0000002a961c7266 in hdb_delete () #17 0x00000000004c38d7 in overlay_op_walk () #18 0x00000000004c3a9f in overlay_op_walk () #19 0x00000000004c3c2e in overlay_op_walk () #20 0x000000000045c958 in fe_op_delete () #21 0x000000000045c688 in do_delete () #22 0x000000000043b6d5 in connection_done () #23 0x000000000043bc89 in connection_client_stop () #24 0x00000000004e7d21 in ldap_pvt_thread_pool_destroy () #25 0x00000030d8d06317 in start_thread () from /lib64/tls/libpthread.so.0 #26 0x00000030d86c9d83 in clone () from /lib64/tls/libc.so.6 Thread 9 (Thread 1107310944 (LWP 26822)): #0 0x00000030d8d08d1a in pthread_cond_wait@@GLIBC_2.3.2 () #1 0x0000002a956a744b in __db_pthread_mutex_lock () #2 0x0000002a956a6b11 in __db_tas_mutex_lock_int () #3 0x0000002a956a6887 in __db_tas_mutex_lock () #4 0x0000002a95777ed2 in __lock_get_internal () #5 0x0000002a9577613d in __lock_get_api () #6 0x0000002a95775fd7 in __lock_get_pp () #7 0x0000002a961df875 in bdb_cache_entry_db_lock () #8 0x0000002a961e0f02 in hdb_cache_find_id () #9 0x0000002a961d707f in hdb_dn2entry () #10 0x0000002a961cd3a1 in hdb_search () #11 0x00000000004c38d7 in overlay_op_walk () #12 0x00000000004c3a9f in overlay_op_walk () #13 0x00000000004c3b7a in overlay_op_walk () #14 0x0000002a96307877 in syncprov_findbase () #15 0x0000002a9630df7a in syncprov_op_search () #16 0x00000000004c3843 in overlay_op_walk () #17 0x00000000004c3a9f in overlay_op_walk () #18 0x00000000004c3b7a in overlay_op_walk () #19 0x000000000043ef62 in fe_op_search () #20 0x000000000043e8c2 in do_search () #21 0x000000000043b6d5 in connection_done () #22 0x000000000043bc89 in connection_client_stop () #23 0x00000000004e7d21 in ldap_pvt_thread_pool_destroy () #24 0x00000030d8d06317 in start_thread () from /lib64/tls/libpthread.so.0 #25 0x00000030d86c9d83 in clone () from /lib64/tls/libc.so.6 Thread 8 (Thread 1115703648 (LWP 26823)): #0 0x00000030d8d08d1a in pthread_cond_wait@@GLIBC_2.3.2 () #1 0x0000002a956a744b in __db_pthread_mutex_lock () #2 0x0000002a956a6b11 in __db_tas_mutex_lock_int () #3 0x0000002a956a6887 in __db_tas_mutex_lock () #4 0x0000002a95777ed2 in __lock_get_internal () #5 0x0000002a9577625d in __lock_get () #6 0x0000002a957be9cf in __db_lget () #7 0x0000002a956d1d66 in __bam_search () #8 0x0000002a956b8ca8 in __bamc_search () #9 0x0000002a956b6918 in __bamc_put () #10 0x0000002a957a98ee in __dbc_iput () #11 0x0000002a957a9747 in __dbc_put () #12 0x0000002a95795be7 in __db_put () #13 0x0000002a957b7d05 in __db_put_pp () #14 0x0000002a961d9eff in bdb_id2entry_put () #15 0x0000002a961d9f7b in hdb_id2entry_update () #16 0x0000002a961c92c4 in hdb_modify () #17 0x00000000004c38d7 in overlay_op_walk () #18 0x00000000004c3a9f in overlay_op_walk () #19 0x00000000004c3bc2 in overlay_op_walk () #20 0x00000000004b7811 in syncrepl_add_glue () #21 0x00000000004af957 in cancel_extop () #22 0x00000000004b1457 in cancel_extop () #23 0x000000000043bca3 in connection_client_stop () #24 0x00000000004e7d21 in ldap_pvt_thread_pool_destroy () #25 0x00000030d8d06317 in start_thread () from /lib64/tls/libpthread.so.0 #26 0x00000030d86c9d83 in clone () from /lib64/tls/libc.so.6 Thread 7 (Thread 1124096352 (LWP 26838)): #0 0x00000030d8d08d1a in pthread_cond_wait@@GLIBC_2.3.2 () #1 0x00000000004e9150 in ldap_pvt_thread_cond_wait () #2 0x00000000004e7ca2 in ldap_pvt_thread_pool_destroy () #3 0x00000030d8d06317 in start_thread () from /lib64/tls/libpthread.so.0 #4 0x00000030d86c9d83 in clone () from /lib64/tls/libc.so.6 Thread 6 (Thread 1132489056 (LWP 26839)): #0 0x00000030d8d08d1a in pthread_cond_wait@@GLIBC_2.3.2 () #1 0x0000002a956a744b in __db_pthread_mutex_lock () #2 0x0000002a956a6b11 in __db_tas_mutex_lock_int () #3 0x0000002a956a6887 in __db_tas_mutex_lock () #4 0x0000002a95777ed2 in __lock_get_internal () #5 0x0000002a9577613d in __lock_get_api () #6 0x0000002a95775fd7 in __lock_get_pp () #7 0x0000002a961df875 in bdb_cache_entry_db_lock () #8 0x0000002a961e0f02 in hdb_cache_find_id () #9 0x0000002a961d707f in hdb_dn2entry () #10 0x0000002a961cd3a1 in hdb_search () #11 0x00000000004c38d7 in overlay_op_walk () #12 0x00000000004c3a9f in overlay_op_walk () #13 0x00000000004c3b7a in overlay_op_walk () #14 0x0000002a96307877 in syncprov_findbase () #15 0x0000002a9630df7a in syncprov_op_search () #16 0x00000000004c3843 in overlay_op_walk () #17 0x00000000004c3a9f in overlay_op_walk () #18 0x00000000004c3b7a in overlay_op_walk () #19 0x000000000043ef62 in fe_op_search () #20 0x000000000043e8c2 in do_search () #21 0x000000000043b6d5 in connection_done () #22 0x000000000043bc89 in connection_client_stop () #23 0x00000000004e7d21 in ldap_pvt_thread_pool_destroy () #24 0x00000030d8d06317 in start_thread () from /lib64/tls/libpthread.so.0 #25 0x00000030d86c9d83 in clone () from /lib64/tls/libc.so.6 Thread 5 (Thread 1140881760 (LWP 26840)): #0 0x00000030d8d0b16b in __lll_mutex_lock_wait () #1 0x00000000440066b0 in ?? () #2 0x0000000000000010 in ?? () #3 0x00000030d8d07f34 in pthread_mutex_lock () from /lib64/tls/libpthread.so.0 #4 0x0000002ab68eb520 in ?? () #5 0x0000000000000028 in ?? () #6 0x00000004d866b20d in ?? () #7 0x0000000000000050 in ?? () #8 0x0000002ab5c00020 in ?? () #9 0x0000000000000029 in ?? () #10 0x00000030d8d06280 in __free_tcb () from /lib64/tls/libpthread.so.0 #11 0x00000000410005e0 in ?? () #12 0x0000002ab5c00020 in ?? () #13 0x000000000000000c in ?? () #14 0x00000030d8d06280 in __free_tcb () from /lib64/tls/libpthread.so.0 #15 0x00000000410005e0 in ?? () #16 0x0000000000000001 in ?? () #17 0x00000000410005e0 in ?? () #18 0x00000030d866bc22 in malloc () from /lib64/tls/libc.so.6 #19 0x0000000000772040 in ?? () #20 0x0000000044006340 in ?? () #21 0x00000000004ade33 in cancel_extop () Thread 4 (Thread 1149274464 (LWP 26972)): #0 0x00000030d8d08d1a in pthread_cond_wait@@GLIBC_2.3.2 () #1 0x0000002a956a744b in __db_pthread_mutex_lock () #2 0x0000002a956a6b11 in __db_tas_mutex_lock_int () #3 0x0000002a956a6887 in __db_tas_mutex_lock () #4 0x0000002a95777ed2 in __lock_get_internal () #5 0x0000002a9577613d in __lock_get_api () #6 0x0000002a95775fd7 in __lock_get_pp () #7 0x0000002a961df875 in bdb_cache_entry_db_lock () #8 0x0000002a961e0f02 in hdb_cache_find_id () #9 0x0000002a961d707f in hdb_dn2entry () #10 0x0000002a961cd3a1 in hdb_search () #11 0x00000000004c38d7 in overlay_op_walk () #12 0x00000000004c3a9f in overlay_op_walk () #13 0x00000000004c3b7a in overlay_op_walk () #14 0x000000000043ef62 in fe_op_search () #15 0x000000000043e8c2 in do_search () #16 0x000000000043b6d5 in connection_done () #17 0x000000000043bc89 in connection_client_stop () #18 0x00000000004e7d21 in ldap_pvt_thread_pool_destroy () #19 0x00000030d8d06317 in start_thread () from /lib64/tls/libpthread.so.0 #20 0x00000030d86c9d83 in clone () from /lib64/tls/libc.so.6 Thread 3 (Thread 1157667168 (LWP 26973)): #0 0x00000030d8d08d1a in pthread_cond_wait@@GLIBC_2.3.2 () #1 0x00000000004e9150 in ldap_pvt_thread_cond_wait () #2 0x00000000004e7ca2 in ldap_pvt_thread_pool_destroy () #3 0x00000030d8d06317 in start_thread () from /lib64/tls/libpthread.so.0 #4 0x00000030d86c9d83 in clone () from /lib64/tls/libc.so.6 Thread 2 (Thread 1166059872 (LWP 26974)): #0 0x00000030d8d08d1a in pthread_cond_wait@@GLIBC_2.3.2 () #1 0x0000002a956a744b in __db_pthread_mutex_lock () #2 0x0000002a956a6b11 in __db_tas_mutex_lock_int () #3 0x0000002a956a6887 in __db_tas_mutex_lock () #4 0x0000002a95777ed2 in __lock_get_internal () #5 0x0000002a9577613d in __lock_get_api () #6 0x0000002a95775fd7 in __lock_get_pp () #7 0x0000002a961df875 in bdb_cache_entry_db_lock () #8 0x0000002a961e0f02 in hdb_cache_find_id () #9 0x0000002a961d707f in hdb_dn2entry () #10 0x0000002a961cd3a1 in hdb_search () #11 0x00000000004c38d7 in overlay_op_walk () #12 0x00000000004c3a9f in overlay_op_walk () #13 0x00000000004c3b7a in overlay_op_walk () #14 0x000000000043ef62 in fe_op_search () #15 0x000000000043e8c2 in do_search () #16 0x000000000043b6d5 in connection_done () #17 0x000000000043bc89 in connection_client_stop () #18 0x00000000004e7d21 in ldap_pvt_thread_pool_destroy () #19 0x00000030d8d06317 in start_thread () from /lib64/tls/libpthread.so.0 #20 0x00000030d86c9d83 in clone () from /lib64/tls/libc.so.6 Thread 1 (Thread 182903650624 (LWP 26818)): #0 0x00000030d8d0732b in pthread_join () from /lib64/tls/libpthread.so.0 #1 0x00000000004e90a8 in ldap_pvt_thread_join () #2 0x0000000000438bd8 in slapd_daemon () #3 0x000000000041932a in main ()
I guess the next step is to start clean and capture some normal verbose logging pstack traces. Then compare it to when one or more of them are hung. Any other suggestions?
Thanks, Mark
On Thu, Mar 31, 2011 at 9:58 PM, GMail mah042@gmail.com wrote:
No I hadn't because the usage and symptoms didn't seem to fit. But it's worth a shot.
Mark
On Mar 31, 2011, at 9:27 PM, Quanah Gibson-Mount quanah@zimbra.com wrote:
--On Thursday, March 31, 2011 9:06 PM -0500 Mark mah042@gmail.com
wrote:
I've been testing a 4-way multi-master setup using OpenLDAP 2.4.25 and I'm having some sporadic problems with it that I'm having difficulty diagnosing..
Have you tried applying the patches in ITS#6872?
--Quanah
--
Quanah Gibson-Mount Sr. Member of Technical Staff Zimbra, Inc A Division of VMware, Inc.
Zimbra :: the leader in open source messaging and collaboration