I think I have an idea of what a good pstack output is supposed to look like.

When idle

The main thread should like this:

Thread 1 (Thread 182903654720 (LWP 22479)):
#0  0x00000035c6d0732b in pthread_join () from /lib64/tls/libpthread.so.0
#1  0x00000000004e90a8 in ldap_pvt_thread_join ()
#2  0x0000000000438bd8 in slapd_daemon ()
#3  0x000000000041932a in main ()


The last thread should look like this:

Thread 6 (Thread 1082132832 (LWP 22480)):
#0  0x00000035c64ca15c in epoll_wait () from /lib64/tls/libc.so.6
#1  0x0000000000437a52 in slapd_daemon_destroy ()
#2  0x00000035c6d06317 in start_thread () from /lib64/tls/libpthread.so.0
#3  0x00000035c64c9d83 in clone () from /lib64/tls/libc.so.6

and the others should look like this:

Thread n (Thread 1090525536 (LWP 22481)):
#0  0x00000035c6d08d1a in pthread_cond_wait@@GLIBC_2.3.2 ()
#1  0x00000000004e9150 in ldap_pvt_thread_cond_wait ()
#2  0x00000000004e7ca2 in ldap_pvt_thread_pool_destroy ()
#3  0x00000035c6d06317 in start_thread () from /lib64/tls/libpthread.so.0
#4  0x00000035c64c9d83 in clone () from /lib64/tls/libc.so.6


After duplicating the issue, that's how three of my multi-masters look (host1, host2 & host3). But one of them (host4) is effectively hung (I can connect and browse my cn=config and cn=monitor backends on host4, but not my 'main' backend on host4). Host4 thinks it has ESTABLISHED (consumer) connections to each of the other three (each of the three connections on host4 to the other mmasters show data waiting in the Recv-Q). But the other three show those connections in FIN_WAIT1 state (in which they'll stay until I kill -9 slapd on host4, it won't respond to a kill -TERM) with data in the Send-Q. The pstack trace on host4 looks very confused. Several of the threads seem to be stuck in BDB?:

Thread 8 (Thread 1082132832 (LWP 28288)):
#0  0x00000030d86ca15c in epoll_wait () from /lib64/tls/libc.so.6
#1  0x0000000000437a52 in slapd_daemon_destroy ()
#2  0x00000030d8d06317 in start_thread () from /lib64/tls/libpthread.so.0
#3  0x00000030d86c9d83 in clone () from /lib64/tls/libc.so.6
Thread 7 (Thread 1090525536 (LWP 28289)):
#0  0x00000030d86b0719 in sched_yield () from /lib64/tls/libc.so.6
#1  0x00000000004e90d0 in ldap_pvt_thread_yield ()
#2  0x0000002a9630e197 in syncprov_op_search ()
#3  0x00000000004c3843 in overlay_op_walk ()
#4  0x00000000004c3a9f in overlay_op_walk ()
#5  0x00000000004c3b7a in overlay_op_walk ()
#6  0x000000000043ef62 in fe_op_search ()
#7  0x000000000043e8c2 in do_search ()
#8  0x000000000043b6d5 in connection_done ()
#9  0x000000000043bc89 in connection_client_stop ()
#10 0x00000000004e7d21 in ldap_pvt_thread_pool_destroy ()
#11 0x00000030d8d06317 in start_thread () from /lib64/tls/libpthread.so.0
#12 0x00000030d86c9d83 in clone () from /lib64/tls/libc.so.6
Thread 6 (Thread 1098918240 (LWP 28290)):
#0  0x00000030d8d08d1a in pthread_cond_wait@@GLIBC_2.3.2 ()
#1  0x0000002a956a744b in __db_pthread_mutex_lock ()
#2  0x0000002a956a6b11 in __db_tas_mutex_lock_int ()
#3  0x0000002a956a6887 in __db_tas_mutex_lock ()
#4  0x0000002a95777ed2 in __lock_get_internal ()
#5  0x0000002a9577625d in __lock_get ()
#6  0x0000002a957be9cf in __db_lget ()
#7  0x0000002a956d1d66 in __bam_search ()
#8  0x0000002a956b8ca8 in __bamc_search ()
#9  0x0000002a956b6918 in __bamc_put ()
#10 0x0000002a957a98ee in __dbc_iput ()
#11 0x0000002a957a9747 in __dbc_put ()
#12 0x0000002a95795be7 in __db_put ()
#13 0x0000002a957b7d05 in __db_put_pp ()
#14 0x0000002a961d777e in hdb_dn2id_add ()
#15 0x0000002a961c3fb4 in hdb_add ()
#16 0x00000000004c38d7 in overlay_op_walk ()
#17 0x00000000004c3a9f in overlay_op_walk ()
#18 0x00000000004c3c0a in overlay_op_walk ()
#19 0x00000000004b482e in cancel_extop ()
#20 0x00000000004af92a in cancel_extop ()
#21 0x00000000004b1457 in cancel_extop ()
#22 0x000000000043bca3 in connection_client_stop ()
#23 0x00000000004e7d21 in ldap_pvt_thread_pool_destroy ()
#24 0x00000030d8d06317 in start_thread () from /lib64/tls/libpthread.so.0
#25 0x00000030d86c9d83 in clone () from /lib64/tls/libc.so.6
Thread 5 (Thread 1107310944 (LWP 28291)):
#0  0x00000030d8d08d1a in pthread_cond_wait@@GLIBC_2.3.2 ()
#1  0x0000002a956a744b in __db_pthread_mutex_lock ()
#2  0x0000002a956a6b11 in __db_tas_mutex_lock_int ()
#3  0x0000002a956a6887 in __db_tas_mutex_lock ()
#4  0x0000002a95777ed2 in __lock_get_internal ()
#5  0x0000002a9577625d in __lock_get ()
#6  0x0000002a957be9cf in __db_lget ()
#7  0x0000002a956d1d66 in __bam_search ()
#8  0x0000002a956b8ca8 in __bamc_search ()
#9  0x0000002a956b6918 in __bamc_put ()
#10 0x0000002a957a98ee in __dbc_iput ()
#11 0x0000002a957a9747 in __dbc_put ()
#12 0x0000002a95795be7 in __db_put ()
#13 0x0000002a957b7d05 in __db_put_pp ()
#14 0x0000002a961d777e in hdb_dn2id_add ()
#15 0x0000002a961c3fb4 in hdb_add ()
#16 0x00000000004c38d7 in overlay_op_walk ()
#17 0x00000000004c3a9f in overlay_op_walk ()
#18 0x00000000004c3c0a in overlay_op_walk ()
#19 0x00000000004b482e in cancel_extop ()
#20 0x00000000004af92a in cancel_extop ()
#21 0x00000000004b1457 in cancel_extop ()
#22 0x000000000043bca3 in connection_client_stop ()
#23 0x00000000004e7d21 in ldap_pvt_thread_pool_destroy ()
#24 0x00000030d8d06317 in start_thread () from /lib64/tls/libpthread.so.0
#25 0x00000030d86c9d83 in clone () from /lib64/tls/libc.so.6
Thread 4 (Thread 1115703648 (LWP 28292)):
#0  0x00000030d8d08d1a in pthread_cond_wait@@GLIBC_2.3.2 ()
#1  0x0000002a956a744b in __db_pthread_mutex_lock ()
#2  0x0000002a956a6b11 in __db_tas_mutex_lock_int ()
#3  0x0000002a956a6887 in __db_tas_mutex_lock ()
#4  0x0000002a95777ed2 in __lock_get_internal ()
#5  0x0000002a9577625d in __lock_get ()
#6  0x0000002a957be9cf in __db_lget ()
#7  0x0000002a956d1d66 in __bam_search ()
#8  0x0000002a956b8ca8 in __bamc_search ()
#9  0x0000002a956b2ea0 in __bamc_get ()
#10 0x0000002a957a78e7 in __dbc_iget ()
#11 0x0000002a957a730b in __dbc_get ()
#12 0x0000002a957b93fe in __dbc_get_pp ()
#13 0x0000002a961d8366 in hdb_dn2id ()
#14 0x0000002a961dff3f in hdb_cache_find_ndn ()
#15 0x0000002a961d6f7c in hdb_dn2entry ()
#16 0x0000002a961c35ca in hdb_add ()
#17 0x00000000004c38d7 in overlay_op_walk ()
#18 0x00000000004c3a9f in overlay_op_walk ()
#19 0x00000000004c3c0a in overlay_op_walk ()
#20 0x00000000004b482e in cancel_extop ()
#21 0x00000000004af92a in cancel_extop ()
#22 0x00000000004b1457 in cancel_extop ()
#23 0x000000000043bca3 in connection_client_stop ()
#24 0x00000000004e7d21 in ldap_pvt_thread_pool_destroy ()
#25 0x00000030d8d06317 in start_thread () from /lib64/tls/libpthread.so.0
#26 0x00000030d86c9d83 in clone () from /lib64/tls/libc.so.6
Thread 3 (Thread 1124096352 (LWP 28297)):
#0  0x00000030d86b0719 in sched_yield () from /lib64/tls/libc.so.6
#1  0x00000000004e90d0 in ldap_pvt_thread_yield ()
#2  0x0000002a9630e197 in syncprov_op_search ()
#3  0x00000000004c3843 in overlay_op_walk ()
#4  0x00000000004c3a9f in overlay_op_walk ()
#5  0x00000000004c3b7a in overlay_op_walk ()
#6  0x000000000043ef62 in fe_op_search ()
#7  0x000000000043e8c2 in do_search ()
#8  0x000000000043b6d5 in connection_done ()
#9  0x000000000043bc89 in connection_client_stop ()
#10 0x00000000004e7d21 in ldap_pvt_thread_pool_destroy ()
#11 0x00000030d8d06317 in start_thread () from /lib64/tls/libpthread.so.0
#12 0x00000030d86c9d83 in clone () from /lib64/tls/libc.so.6
Thread 2 (Thread 1132489056 (LWP 28298)):
#0  0x00000030d86b0719 in sched_yield () from /lib64/tls/libc.so.6
#1  0x00000000004e90d0 in ldap_pvt_thread_yield ()
#2  0x0000002a9630e197 in syncprov_op_search ()
#3  0x00000000004c3843 in overlay_op_walk ()
#4  0x00000000004c3a9f in overlay_op_walk ()
#5  0x00000000004c3b7a in overlay_op_walk ()
#6  0x000000000043ef62 in fe_op_search ()
#7  0x000000000043e8c2 in do_search ()
#8  0x000000000043b6d5 in connection_done ()
#9  0x000000000043bc89 in connection_client_stop ()
#10 0x00000000004e7d21 in ldap_pvt_thread_pool_destroy ()
#11 0x00000030d8d06317 in start_thread () from /lib64/tls/libpthread.so.0
#12 0x00000030d86c9d83 in clone () from /lib64/tls/libc.so.6
Thread 1 (Thread 182903650624 (LWP 28287)):
#0  0x00000030d8d0732b in pthread_join () from /lib64/tls/libpthread.so.0
#1  0x00000000004e90a8 in ldap_pvt_thread_join ()
#2  0x0000000000438bd8 in slapd_daemon ()
#3  0x000000000041932a in main ()

Can anyone help me determine what's going on?

Thanks,
Mark




On Sat, Apr 2, 2011 at 9:31 PM, Mark <mah042@gmail.com> wrote:
The patch in ITS#6872 didn't fix the issue.

My first thought was to enable all the logging, but there's so much data and I don't know what's normal and what isn't. I captured the (netstat) connection information on all four hosts. Several of the connections are stuck in FIN_WAIT1 which normally is a quick, transitional state:

host1$ netstat -an | fgrep :1389
Proto Recv-Q Send-Q Local Address               Foreign Address             State
tcp        0      0 10.1.1.1:1389               0.0.0.0:*                   LISTEN
tcp    65115      0 10.1.1.1:19284    -->       10.1.1.4:1389               ESTABLISHED
tcp        0      0 10.1.1.1:1389     <--       10.1.1.4:36991              ESTABLISHED
tcp    73458      0 10.1.1.1:19286    -->       10.1.1.3:1389               ESTABLISHED
tcp        0      0 10.1.1.1:1389     <--       10.1.1.3:38085              ESTABLISHED
tcp    73112      0 10.1.1.1:19263    -->       10.1.1.2:1389               ESTABLISHED
tcp        0      0 10.1.1.1:1389     <--       10.1.1.2:41374              ESTABLISHED

host2$ netstat -an | fgrep :1389
Proto Recv-Q Send-Q Local Address               Foreign Address             State
tcp        0      0 10.1.1.2:1389               0.0.0.0:*                   LISTEN
tcp        0  11537 10.1.1.2:1389     <--       10.1.1.1:19263              FIN_WAIT1
tcp        0      0 10.1.1.2:1389     <--       10.1.1.4:36992              ESTABLISHED
tcp        0      0 10.1.1.2:1389     <--       10.1.1.3:38086              ESTABLISHED
tcp        0      0 10.1.1.2:41373    -->       10.1.1.3:1389               ESTABLISHED
tcp        0      0 10.1.1.2:41375    -->       10.1.1.4:1389               ESTABLISHED
tcp        0      0 10.1.1.2:41374    -->       10.1.1.1:1389               ESTABLISHED

host3$ netstat -an | fgrep :1389
Proto Recv-Q Send-Q Local Address               Foreign Address             State
tcp        0      0 10.1.1.3:1389               0.0.0.0:*                   LISTEN
tcp        0  11521 10.1.1.3:1389     <--       10.1.1.1:19286              FIN_WAIT1
tcp        0      0 10.1.1.3:38087    -->       10.1.1.4:1389               ESTABLISHED
tcp        0      0 10.1.1.3:38085    -->       10.1.1.1:1389               ESTABLISHED
tcp        0  11505 10.1.1.3:1389     <--       10.1.1.4:37000              FIN_WAIT1
tcp        0      0 10.1.1.3:38086    -->       10.1.1.2:1389               ESTABLISHED
tcp        0      0 10.1.1.3:1389     <--       10.1.1.2:41373              ESTABLISHED

host4$ netstat -an | fgrep :1389
Proto Recv-Q Send-Q Local Address               Foreign Address             State
tcp        0      0 10.1.1.4:1389               0.0.0.0:*                   LISTEN
tcp        0  14281 10.1.1.4:1389     <--       10.1.1.1:19284              FIN_WAIT1
tcp    73567      0 10.1.1.4:37000    -->       10.1.1.3:1389               ESTABLISHED
tcp        0      0 10.1.1.4:1389     <--       10.1.1.3:38087              ESTABLISHED
tcp    17534      0 10.1.1.4:36991    -->       10.1.1.1:1389               ESTABLISHED
tcp        0      0 10.1.1.4:1389     <--       10.1.1.2:41375              ESTABLISHED
tcp        0      0 10.1.1.4:36992    -->       10.1.1.2:1389               ESTABLISHED

I also captured a pstack trace on each of the four slapds. But again, I'm not sure what's normal:

host1$
Thread 17 (Thread 1082132832 (LWP 25922)):

#0  0x0000003a340ca15c in epoll_wait () from /lib64/tls/libc.so.6
#1  0x0000000000437a52 in slapd_daemon_destroy ()
#2  0x0000003a34706317 in start_thread () from /lib64/tls/libpthread.so.0
#3  0x0000003a340c9d83 in clone () from /lib64/tls/libc.so.6
Thread 16 (Thread 1090525536 (LWP 25923)):
#0  0x0000003a34708d1a in pthread_cond_wait@@GLIBC_2.3.2 ()
#1  0x00000000004e9150 in ldap_pvt_thread_cond_wait ()
#2  0x00000000004e7ca2 in ldap_pvt_thread_pool_destroy ()
#3  0x0000003a34706317 in start_thread () from /lib64/tls/libpthread.so.0
#4  0x0000003a340c9d83 in clone () from /lib64/tls/libc.so.6
Thread 15 (Thread 1098918240 (LWP 25924)):
#0  0x0000003a34708d1a in pthread_cond_wait@@GLIBC_2.3.2 ()
#1  0x00000000004e9150 in ldap_pvt_thread_cond_wait ()
#2  0x00000000004e7ca2 in ldap_pvt_thread_pool_destroy ()
#3  0x0000003a34706317 in start_thread () from /lib64/tls/libpthread.so.0
#4  0x0000003a340c9d83 in clone () from /lib64/tls/libc.so.6
Thread 14 (Thread 1107310944 (LWP 25925)):
#0  0x0000003a34708d1a in pthread_cond_wait@@GLIBC_2.3.2 ()
#1  0x00000000004e9150 in ldap_pvt_thread_cond_wait ()
#2  0x00000000004e7ca2 in ldap_pvt_thread_pool_destroy ()
#3  0x0000003a34706317 in start_thread () from /lib64/tls/libpthread.so.0
#4  0x0000003a340c9d83 in clone () from /lib64/tls/libc.so.6
Thread 13 (Thread 1115703648 (LWP 25926)):
#0  0x0000003a34708d1a in pthread_cond_wait@@GLIBC_2.3.2 ()
#1  0x00000000004e9150 in ldap_pvt_thread_cond_wait ()
#2  0x00000000004e7ca2 in ldap_pvt_thread_pool_destroy ()
#3  0x0000003a34706317 in start_thread () from /lib64/tls/libpthread.so.0
#4  0x0000003a340c9d83 in clone () from /lib64/tls/libc.so.6
Thread 12 (Thread 1124096352 (LWP 26071)):
#0  0x0000003a34708d1a in pthread_cond_wait@@GLIBC_2.3.2 ()
#1  0x00000000004e9150 in ldap_pvt_thread_cond_wait ()
#2  0x00000000004e7ca2 in ldap_pvt_thread_pool_destroy ()
#3  0x0000003a34706317 in start_thread () from /lib64/tls/libpthread.so.0
#4  0x0000003a340c9d83 in clone () from /lib64/tls/libc.so.6
Thread 11 (Thread 1132489056 (LWP 26072)):
#0  0x0000003a340b0719 in sched_yield () from /lib64/tls/libc.so.6
#1  0x00000000004e90d0 in ldap_pvt_thread_yield ()
#2  0x0000002a9630d11e in syncprov_op_search ()
#3  0x00000000004c3843 in overlay_op_walk ()
#4  0x00000000004c3a9f in overlay_op_walk ()
#5  0x00000000004c3b7a in overlay_op_walk ()
#6  0x000000000043ef62 in fe_op_search ()
#7  0x000000000043e8c2 in do_search ()
#8  0x000000000043b6d5 in connection_done ()
#9  0x000000000043bc89 in connection_client_stop ()
#10 0x00000000004e7d21 in ldap_pvt_thread_pool_destroy ()
#11 0x0000003a34706317 in start_thread () from /lib64/tls/libpthread.so.0
#12 0x0000003a340c9d83 in clone () from /lib64/tls/libc.so.6
Thread 10 (Thread 1140881760 (LWP 26073)):
#0  0x0000003a34708d1a in pthread_cond_wait@@GLIBC_2.3.2 ()
#1  0x0000002a956a644b in __db_pthread_mutex_lock ()
#2  0x0000002a956a5b11 in __db_tas_mutex_lock_int ()
#3  0x0000002a956a5887 in __db_tas_mutex_lock ()
#4  0x0000002a95776ed2 in __lock_get_internal ()
#5  0x0000002a9577525d in __lock_get ()
#6  0x0000002a957bd9cf in __db_lget ()
#7  0x0000002a956b67bc in __bamc_writelock ()
#8  0x0000002a957a5b71 in __dbc_idel ()
#9  0x0000002a957a5ace in __dbc_del ()
#10 0x0000002a957b8039 in __dbc_del_pp ()
#11 0x0000002a961dc91e in hdb_idl_delete_key ()
#12 0x0000002a961d1d4b in hdb_key_change ()
#13 0x0000002a961d0d1b in indexer ()
#14 0x0000002a961d1159 in index_at_values ()
#15 0x0000002a961d12d2 in hdb_index_values ()
#16 0x0000002a961d173a in hdb_index_entry ()
#17 0x0000002a961c5888 in hdb_delete ()
#18 0x00000000004c38d7 in overlay_op_walk ()
#19 0x00000000004c3a9f in overlay_op_walk ()
#20 0x00000000004c3c2e in overlay_op_walk ()
#21 0x00000000004b5b1a in cancel_extop ()
#22 0x00000000004af92a in cancel_extop ()

host2$
Thread 9 (Thread 1082132832 (LWP 12700)):
#0  0x0000003530bca15c in epoll_wait () from /lib64/tls/libc.so.6
#1  0x0000000000437a52 in slapd_daemon_destroy ()
#2  0x0000003531206317 in start_thread () from /lib64/tls/libpthread.so.0
#3  0x0000003530bc9d83 in clone () from /lib64/tls/libc.so.6
Thread 8 (Thread 1090525536 (LWP 12701)):
#0  0x0000003531208d1a in pthread_cond_wait@@GLIBC_2.3.2 ()
#1  0x00000000004e9150 in ldap_pvt_thread_cond_wait ()
#2  0x00000000004e7ca2 in ldap_pvt_thread_pool_destroy ()
#3  0x0000003531206317 in start_thread () from /lib64/tls/libpthread.so.0
#4  0x0000003530bc9d83 in clone () from /lib64/tls/libc.so.6
Thread 7 (Thread 1098918240 (LWP 12702)):
#0  0x0000003531208d1a in pthread_cond_wait@@GLIBC_2.3.2 ()
#1  0x00000000004e9150 in ldap_pvt_thread_cond_wait ()
#2  0x00000000004e7ca2 in ldap_pvt_thread_pool_destroy ()
#3  0x0000003531206317 in start_thread () from /lib64/tls/libpthread.so.0
#4  0x0000003530bc9d83 in clone () from /lib64/tls/libc.so.6
Thread 6 (Thread 1107310944 (LWP 12703)):
#0  0x0000003531208d1a in pthread_cond_wait@@GLIBC_2.3.2 ()
#1  0x00000000004e9150 in ldap_pvt_thread_cond_wait ()
#2  0x00000000004e7ca2 in ldap_pvt_thread_pool_destroy ()
#3  0x0000003531206317 in start_thread () from /lib64/tls/libpthread.so.0
#4  0x0000003530bc9d83 in clone () from /lib64/tls/libc.so.6
Thread 5 (Thread 1115703648 (LWP 12704)):
#0  0x0000003531208d1a in pthread_cond_wait@@GLIBC_2.3.2 ()
#1  0x00000000004e9150 in ldap_pvt_thread_cond_wait ()
#2  0x00000000004e7ca2 in ldap_pvt_thread_pool_destroy ()
#3  0x0000003531206317 in start_thread () from /lib64/tls/libpthread.so.0
#4  0x0000003530bc9d83 in clone () from /lib64/tls/libc.so.6
Thread 4 (Thread 1124096352 (LWP 13049)):
#0  0x0000003531208d1a in pthread_cond_wait@@GLIBC_2.3.2 ()
#1  0x00000000004e9150 in ldap_pvt_thread_cond_wait ()
#2  0x00000000004e7ca2 in ldap_pvt_thread_pool_destroy ()
#3  0x0000003531206317 in start_thread () from /lib64/tls/libpthread.so.0
#4  0x0000003530bc9d83 in clone () from /lib64/tls/libc.so.6
Thread 3 (Thread 1132489056 (LWP 13050)):
#0  0x0000003531208d1a in pthread_cond_wait@@GLIBC_2.3.2 ()
#1  0x00000000004e9150 in ldap_pvt_thread_cond_wait ()
#2  0x00000000004e7ca2 in ldap_pvt_thread_pool_destroy ()
#3  0x0000003531206317 in start_thread () from /lib64/tls/libpthread.so.0
#4  0x0000003530bc9d83 in clone () from /lib64/tls/libc.so.6
Thread 2 (Thread 1140881760 (LWP 13059)):
#0  0x0000003531208d1a in pthread_cond_wait@@GLIBC_2.3.2 ()
#1  0x00000000004e9150 in ldap_pvt_thread_cond_wait ()
#2  0x00000000004e7ca2 in ldap_pvt_thread_pool_destroy ()
#3  0x0000003531206317 in start_thread () from /lib64/tls/libpthread.so.0
#4  0x0000003530bc9d83 in clone () from /lib64/tls/libc.so.6
Thread 1 (Thread 182903646528 (LWP 12693)):
#0  0x000000353120732b in pthread_join () from /lib64/tls/libpthread.so.0
#1  0x00000000004e90a8 in ldap_pvt_thread_join ()
#2  0x0000000000438bd8 in slapd_daemon ()
#3  0x000000000041932a in main ()


host3$
Thread 9 (Thread 1082132832 (LWP 20629)):
#0  0x00000035c64ca15c in epoll_wait () from /lib64/tls/libc.so.6
#1  0x0000000000437a52 in slapd_daemon_destroy ()
#2  0x00000035c6d06317 in start_thread () from /lib64/tls/libpthread.so.0
#3  0x00000035c64c9d83 in clone () from /lib64/tls/libc.so.6
Thread 8 (Thread 1090525536 (LWP 20630)):
#0  0x00000035c6d08d1a in pthread_cond_wait@@GLIBC_2.3.2 ()
#1  0x00000000004e9150 in ldap_pvt_thread_cond_wait ()
#2  0x00000000004e7ca2 in ldap_pvt_thread_pool_destroy ()
#3  0x00000035c6d06317 in start_thread () from /lib64/tls/libpthread.so.0
#4  0x00000035c64c9d83 in clone () from /lib64/tls/libc.so.6
Thread 7 (Thread 1098918240 (LWP 20631)):
#0  0x00000035c6d08d1a in pthread_cond_wait@@GLIBC_2.3.2 ()
#1  0x00000000004e9150 in ldap_pvt_thread_cond_wait ()
#2  0x00000000004e7ca2 in ldap_pvt_thread_pool_destroy ()
#3  0x00000035c6d06317 in start_thread () from /lib64/tls/libpthread.so.0
#4  0x00000035c64c9d83 in clone () from /lib64/tls/libc.so.6
Thread 6 (Thread 1107310944 (LWP 20632)):
#0  0x00000035c6d08d1a in pthread_cond_wait@@GLIBC_2.3.2 ()
#1  0x00000000004e9150 in ldap_pvt_thread_cond_wait ()
#2  0x00000000004e7ca2 in ldap_pvt_thread_pool_destroy ()
#3  0x00000035c6d06317 in start_thread () from /lib64/tls/libpthread.so.0
#4  0x00000035c64c9d83 in clone () from /lib64/tls/libc.so.6
Thread 5 (Thread 1115703648 (LWP 20633)):
#0  0x00000035c6d08d1a in pthread_cond_wait@@GLIBC_2.3.2 ()
#1  0x00000000004e9150 in ldap_pvt_thread_cond_wait ()
#2  0x00000000004e7ca2 in ldap_pvt_thread_pool_destroy ()
#3  0x00000035c6d06317 in start_thread () from /lib64/tls/libpthread.so.0
#4  0x00000035c64c9d83 in clone () from /lib64/tls/libc.so.6
Thread 4 (Thread 1124096352 (LWP 20983)):
#0  0x00000035c6d08d1a in pthread_cond_wait@@GLIBC_2.3.2 ()
#1  0x00000000004e9150 in ldap_pvt_thread_cond_wait ()
#2  0x00000000004e7ca2 in ldap_pvt_thread_pool_destroy ()
#3  0x00000035c6d06317 in start_thread () from /lib64/tls/libpthread.so.0
#4  0x00000035c64c9d83 in clone () from /lib64/tls/libc.so.6
Thread 3 (Thread 1132489056 (LWP 20984)):
#0  0x00000035c6d08d1a in pthread_cond_wait@@GLIBC_2.3.2 ()
#1  0x00000000004e9150 in ldap_pvt_thread_cond_wait ()
#2  0x00000000004e7ca2 in ldap_pvt_thread_pool_destroy ()
#3  0x00000035c6d06317 in start_thread () from /lib64/tls/libpthread.so.0
#4  0x00000035c64c9d83 in clone () from /lib64/tls/libc.so.6
Thread 2 (Thread 1140881760 (LWP 21005)):
#0  0x00000035c6d08d1a in pthread_cond_wait@@GLIBC_2.3.2 ()
#1  0x00000000004e9150 in ldap_pvt_thread_cond_wait ()
#2  0x00000000004e7ca2 in ldap_pvt_thread_pool_destroy ()
#3  0x00000035c6d06317 in start_thread () from /lib64/tls/libpthread.so.0
#4  0x00000035c64c9d83 in clone () from /lib64/tls/libc.so.6
Thread 1 (Thread 182903654720 (LWP 20628)):
#0  0x00000035c6d0732b in pthread_join () from /lib64/tls/libpthread.so.0
#1  0x00000000004e90a8 in ldap_pvt_thread_join ()
#2  0x0000000000438bd8 in slapd_daemon ()
#3  0x000000000041932a in main ()


host4$
Thread 12 (Thread 1082132832 (LWP 26819)):
#0  0x00000030d86ca15c in epoll_wait () from /lib64/tls/libc.so.6
#1  0x0000000000437a52 in slapd_daemon_destroy ()
#2  0x00000030d8d06317 in start_thread () from /lib64/tls/libpthread.so.0
#3  0x00000030d86c9d83 in clone () from /lib64/tls/libc.so.6
Thread 11 (Thread 1090525536 (LWP 26820)):
#0  0x00000030d86b0719 in sched_yield () from /lib64/tls/libc.so.6
#1  0x00000000004e90d0 in ldap_pvt_thread_yield ()
#2  0x00000000004af4ab in cancel_extop ()
#3  0x00000000004b1457 in cancel_extop ()
#4  0x000000000043bca3 in connection_client_stop ()
#5  0x00000000004e7d21 in ldap_pvt_thread_pool_destroy ()
#6  0x00000030d8d06317 in start_thread () from /lib64/tls/libpthread.so.0
#7  0x00000030d86c9d83 in clone () from /lib64/tls/libc.so.6
Thread 10 (Thread 1098918240 (LWP 26821)):
#0  0x00000030d8d08d1a in pthread_cond_wait@@GLIBC_2.3.2 ()
#1  0x0000002a956a744b in __db_pthread_mutex_lock ()
#2  0x0000002a956a6b11 in __db_tas_mutex_lock_int ()
#3  0x0000002a956a6887 in __db_tas_mutex_lock ()
#4  0x0000002a95777ed2 in __lock_get_internal ()
#5  0x0000002a957750ae in __lock_vec ()
#6  0x0000002a95774e53 in __lock_vec_api ()
#7  0x0000002a95774da3 in __lock_vec_pp ()
#8  0x0000002a961df6f0 in hdb_cache_entry_db_relock ()
#9  0x0000002a961e169e in hdb_cache_modify ()
#10 0x0000002a961c95ac in hdb_modify ()
#11 0x0000002a9630a959 in syncprov_checkpoint ()
#12 0x0000002a9630c241 in syncprov_op_response ()
#13 0x00000000004500f6 in rs_entry2modifiable ()
#14 0x00000000004502f5 in rs_entry2modifiable ()
#15 0x000000000045112e in slap_send_ldap_result ()
#16 0x0000002a961c7266 in hdb_delete ()
#17 0x00000000004c38d7 in overlay_op_walk ()
#18 0x00000000004c3a9f in overlay_op_walk ()
#19 0x00000000004c3c2e in overlay_op_walk ()
#20 0x000000000045c958 in fe_op_delete ()
#21 0x000000000045c688 in do_delete ()
#22 0x000000000043b6d5 in connection_done ()
#23 0x000000000043bc89 in connection_client_stop ()
#24 0x00000000004e7d21 in ldap_pvt_thread_pool_destroy ()
#25 0x00000030d8d06317 in start_thread () from /lib64/tls/libpthread.so.0
#26 0x00000030d86c9d83 in clone () from /lib64/tls/libc.so.6
Thread 9 (Thread 1107310944 (LWP 26822)):
#0  0x00000030d8d08d1a in pthread_cond_wait@@GLIBC_2.3.2 ()
#1  0x0000002a956a744b in __db_pthread_mutex_lock ()
#2  0x0000002a956a6b11 in __db_tas_mutex_lock_int ()
#3  0x0000002a956a6887 in __db_tas_mutex_lock ()
#4  0x0000002a95777ed2 in __lock_get_internal ()
#5  0x0000002a9577613d in __lock_get_api ()
#6  0x0000002a95775fd7 in __lock_get_pp ()
#7  0x0000002a961df875 in bdb_cache_entry_db_lock ()
#8  0x0000002a961e0f02 in hdb_cache_find_id ()
#9  0x0000002a961d707f in hdb_dn2entry ()
#10 0x0000002a961cd3a1 in hdb_search ()
#11 0x00000000004c38d7 in overlay_op_walk ()
#12 0x00000000004c3a9f in overlay_op_walk ()
#13 0x00000000004c3b7a in overlay_op_walk ()
#14 0x0000002a96307877 in syncprov_findbase ()
#15 0x0000002a9630df7a in syncprov_op_search ()
#16 0x00000000004c3843 in overlay_op_walk ()
#17 0x00000000004c3a9f in overlay_op_walk ()
#18 0x00000000004c3b7a in overlay_op_walk ()
#19 0x000000000043ef62 in fe_op_search ()
#20 0x000000000043e8c2 in do_search ()
#21 0x000000000043b6d5 in connection_done ()
#22 0x000000000043bc89 in connection_client_stop ()
#23 0x00000000004e7d21 in ldap_pvt_thread_pool_destroy ()
#24 0x00000030d8d06317 in start_thread () from /lib64/tls/libpthread.so.0
#25 0x00000030d86c9d83 in clone () from /lib64/tls/libc.so.6
Thread 8 (Thread 1115703648 (LWP 26823)):
#0  0x00000030d8d08d1a in pthread_cond_wait@@GLIBC_2.3.2 ()
#1  0x0000002a956a744b in __db_pthread_mutex_lock ()
#2  0x0000002a956a6b11 in __db_tas_mutex_lock_int ()
#3  0x0000002a956a6887 in __db_tas_mutex_lock ()
#4  0x0000002a95777ed2 in __lock_get_internal ()
#5  0x0000002a9577625d in __lock_get ()
#6  0x0000002a957be9cf in __db_lget ()
#7  0x0000002a956d1d66 in __bam_search ()
#8  0x0000002a956b8ca8 in __bamc_search ()
#9  0x0000002a956b6918 in __bamc_put ()
#10 0x0000002a957a98ee in __dbc_iput ()
#11 0x0000002a957a9747 in __dbc_put ()
#12 0x0000002a95795be7 in __db_put ()
#13 0x0000002a957b7d05 in __db_put_pp ()
#14 0x0000002a961d9eff in bdb_id2entry_put ()
#15 0x0000002a961d9f7b in hdb_id2entry_update ()
#16 0x0000002a961c92c4 in hdb_modify ()
#17 0x00000000004c38d7 in overlay_op_walk ()
#18 0x00000000004c3a9f in overlay_op_walk ()
#19 0x00000000004c3bc2 in overlay_op_walk ()
#20 0x00000000004b7811 in syncrepl_add_glue ()
#21 0x00000000004af957 in cancel_extop ()
#22 0x00000000004b1457 in cancel_extop ()
#23 0x000000000043bca3 in connection_client_stop ()
#24 0x00000000004e7d21 in ldap_pvt_thread_pool_destroy ()
#25 0x00000030d8d06317 in start_thread () from /lib64/tls/libpthread.so.0
#26 0x00000030d86c9d83 in clone () from /lib64/tls/libc.so.6
Thread 7 (Thread 1124096352 (LWP 26838)):
#0  0x00000030d8d08d1a in pthread_cond_wait@@GLIBC_2.3.2 ()
#1  0x00000000004e9150 in ldap_pvt_thread_cond_wait ()
#2  0x00000000004e7ca2 in ldap_pvt_thread_pool_destroy ()
#3  0x00000030d8d06317 in start_thread () from /lib64/tls/libpthread.so.0
#4  0x00000030d86c9d83 in clone () from /lib64/tls/libc.so.6
Thread 6 (Thread 1132489056 (LWP 26839)):
#0  0x00000030d8d08d1a in pthread_cond_wait@@GLIBC_2.3.2 ()
#1  0x0000002a956a744b in __db_pthread_mutex_lock ()
#2  0x0000002a956a6b11 in __db_tas_mutex_lock_int ()
#3  0x0000002a956a6887 in __db_tas_mutex_lock ()
#4  0x0000002a95777ed2 in __lock_get_internal ()
#5  0x0000002a9577613d in __lock_get_api ()
#6  0x0000002a95775fd7 in __lock_get_pp ()
#7  0x0000002a961df875 in bdb_cache_entry_db_lock ()
#8  0x0000002a961e0f02 in hdb_cache_find_id ()
#9  0x0000002a961d707f in hdb_dn2entry ()
#10 0x0000002a961cd3a1 in hdb_search ()
#11 0x00000000004c38d7 in overlay_op_walk ()
#12 0x00000000004c3a9f in overlay_op_walk ()
#13 0x00000000004c3b7a in overlay_op_walk ()
#14 0x0000002a96307877 in syncprov_findbase ()
#15 0x0000002a9630df7a in syncprov_op_search ()
#16 0x00000000004c3843 in overlay_op_walk ()
#17 0x00000000004c3a9f in overlay_op_walk ()
#18 0x00000000004c3b7a in overlay_op_walk ()
#19 0x000000000043ef62 in fe_op_search ()
#20 0x000000000043e8c2 in do_search ()
#21 0x000000000043b6d5 in connection_done ()
#22 0x000000000043bc89 in connection_client_stop ()
#23 0x00000000004e7d21 in ldap_pvt_thread_pool_destroy ()
#24 0x00000030d8d06317 in start_thread () from /lib64/tls/libpthread.so.0
#25 0x00000030d86c9d83 in clone () from /lib64/tls/libc.so.6
Thread 5 (Thread 1140881760 (LWP 26840)):
#0  0x00000030d8d0b16b in __lll_mutex_lock_wait ()
#1  0x00000000440066b0 in ?? ()
#2  0x0000000000000010 in ?? ()
#3  0x00000030d8d07f34 in pthread_mutex_lock () from /lib64/tls/libpthread.so.0
#4  0x0000002ab68eb520 in ?? ()
#5  0x0000000000000028 in ?? ()
#6  0x00000004d866b20d in ?? ()
#7  0x0000000000000050 in ?? ()
#8  0x0000002ab5c00020 in ?? ()
#9  0x0000000000000029 in ?? ()
#10 0x00000030d8d06280 in __free_tcb () from /lib64/tls/libpthread.so.0
#11 0x00000000410005e0 in ?? ()
#12 0x0000002ab5c00020 in ?? ()
#13 0x000000000000000c in ?? ()
#14 0x00000030d8d06280 in __free_tcb () from /lib64/tls/libpthread.so.0
#15 0x00000000410005e0 in ?? ()
#16 0x0000000000000001 in ?? ()
#17 0x00000000410005e0 in ?? ()
#18 0x00000030d866bc22 in malloc () from /lib64/tls/libc.so.6
#19 0x0000000000772040 in ?? ()
#20 0x0000000044006340 in ?? ()
#21 0x00000000004ade33 in cancel_extop ()
Thread 4 (Thread 1149274464 (LWP 26972)):
#0  0x00000030d8d08d1a in pthread_cond_wait@@GLIBC_2.3.2 ()
#1  0x0000002a956a744b in __db_pthread_mutex_lock ()
#2  0x0000002a956a6b11 in __db_tas_mutex_lock_int ()
#3  0x0000002a956a6887 in __db_tas_mutex_lock ()
#4  0x0000002a95777ed2 in __lock_get_internal ()
#5  0x0000002a9577613d in __lock_get_api ()
#6  0x0000002a95775fd7 in __lock_get_pp ()
#7  0x0000002a961df875 in bdb_cache_entry_db_lock ()
#8  0x0000002a961e0f02 in hdb_cache_find_id ()
#9  0x0000002a961d707f in hdb_dn2entry ()
#10 0x0000002a961cd3a1 in hdb_search ()
#11 0x00000000004c38d7 in overlay_op_walk ()
#12 0x00000000004c3a9f in overlay_op_walk ()
#13 0x00000000004c3b7a in overlay_op_walk ()
#14 0x000000000043ef62 in fe_op_search ()
#15 0x000000000043e8c2 in do_search ()
#16 0x000000000043b6d5 in connection_done ()
#17 0x000000000043bc89 in connection_client_stop ()
#18 0x00000000004e7d21 in ldap_pvt_thread_pool_destroy ()
#19 0x00000030d8d06317 in start_thread () from /lib64/tls/libpthread.so.0
#20 0x00000030d86c9d83 in clone () from /lib64/tls/libc.so.6
Thread 3 (Thread 1157667168 (LWP 26973)):
#0  0x00000030d8d08d1a in pthread_cond_wait@@GLIBC_2.3.2 ()
#1  0x00000000004e9150 in ldap_pvt_thread_cond_wait ()
#2  0x00000000004e7ca2 in ldap_pvt_thread_pool_destroy ()
#3  0x00000030d8d06317 in start_thread () from /lib64/tls/libpthread.so.0
#4  0x00000030d86c9d83 in clone () from /lib64/tls/libc.so.6
Thread 2 (Thread 1166059872 (LWP 26974)):
#0  0x00000030d8d08d1a in pthread_cond_wait@@GLIBC_2.3.2 ()
#1  0x0000002a956a744b in __db_pthread_mutex_lock ()
#2  0x0000002a956a6b11 in __db_tas_mutex_lock_int ()
#3  0x0000002a956a6887 in __db_tas_mutex_lock ()
#4  0x0000002a95777ed2 in __lock_get_internal ()
#5  0x0000002a9577613d in __lock_get_api ()
#6  0x0000002a95775fd7 in __lock_get_pp ()
#7  0x0000002a961df875 in bdb_cache_entry_db_lock ()
#8  0x0000002a961e0f02 in hdb_cache_find_id ()
#9  0x0000002a961d707f in hdb_dn2entry ()
#10 0x0000002a961cd3a1 in hdb_search ()
#11 0x00000000004c38d7 in overlay_op_walk ()
#12 0x00000000004c3a9f in overlay_op_walk ()
#13 0x00000000004c3b7a in overlay_op_walk ()
#14 0x000000000043ef62 in fe_op_search ()
#15 0x000000000043e8c2 in do_search ()
#16 0x000000000043b6d5 in connection_done ()
#17 0x000000000043bc89 in connection_client_stop ()
#18 0x00000000004e7d21 in ldap_pvt_thread_pool_destroy ()
#19 0x00000030d8d06317 in start_thread () from /lib64/tls/libpthread.so.0
#20 0x00000030d86c9d83 in clone () from /lib64/tls/libc.so.6
Thread 1 (Thread 182903650624 (LWP 26818)):
#0  0x00000030d8d0732b in pthread_join () from /lib64/tls/libpthread.so.0
#1  0x00000000004e90a8 in ldap_pvt_thread_join ()
#2  0x0000000000438bd8 in slapd_daemon ()
#3  0x000000000041932a in main ()

I guess the next step is to start clean and capture some normal verbose logging pstack traces. Then compare it to when one or more of them are hung. Any other suggestions?


Thanks,
Mark


On Thu, Mar 31, 2011 at 9:58 PM, GMail <mah042@gmail.com> wrote:
No I hadn't because the usage and symptoms didn't seem to fit. But it's worth a shot.

---
Mark

On Mar 31, 2011, at 9:27 PM, Quanah Gibson-Mount <quanah@zimbra.com> wrote:

> --On Thursday, March 31, 2011 9:06 PM -0500 Mark <mah042@gmail.com> wrote:
>
>> I've been testing a 4-way multi-master setup using OpenLDAP 2.4.25 and
>> I'm having some sporadic problems with it that I'm having difficulty
>> diagnosing..
>
> Have you tried applying the patches in ITS#6872?
>
> --Quanah
>
>
>
> --
>
> Quanah Gibson-Mount
> Sr. Member of Technical Staff
> Zimbra, Inc
> A Division of VMware, Inc.
> --------------------
> Zimbra ::  the leader in open source messaging and collaboration