Full_Name: Kevin Sullivan Version: 2.4.39-8 OS: RHEL 6.4 URL: ftp://ftp.openldap.org/incoming/ Submission from: (NULL) (38.127.164.138)
Setup:
I have two OpenLDAP servers running on separate hosts and replicating via delta syncrepl in mirrormode.
Problem:
On rare occasions when one of my servers comes online, both servers will lock up and be unresponsive. Any query to either server will time out. The servers will stay in this state indefinitely.
Workaround:
Killing either of the servers will resolve this problem.
Observations:
- I don't know how to reproduce this reliably. - Both servers have a thread that is stuck in a do_syncrep1() call. - Netstat shows that each server has unacknowledged data in one of their socket's receive queue.
GDB on host1:
(gdb) info threads 18 Thread 0x7f8b7b80d700 (LWP 31620) 0x00007f8ba7fa6f03 in epoll_wait () from /lib64/libc.so.6 17 Thread 0x7f8b7b00c700 (LWP 31621) 0x00007f8ba846a054 in __lll_lock_wait () from /lib64/libpthread.so.0 16 Thread 0x7f8b7a80b700 (LWP 31622) 0x00007f8ba846a054 in __lll_lock_wait () from /lib64/libpthread.so.0 15 Thread 0x7f8b7a00a700 (LWP 31623) 0x00007f8ba846a054 in __lll_lock_wait () from /lib64/libpthread.so.0 14 Thread 0x7f8b79809700 (LWP 31625) 0x00007f8ba846a54d in read () from /lib64/libpthread.so.0 13 Thread 0x7f8b79008700 (LWP 31626) 0x00007f8ba846a054 in __lll_lock_wait () from /lib64/libpthread.so.0 12 Thread 0x7f8b59ffd700 (LWP 25459) 0x00007f8ba846a054 in __lll_lock_wait () from /lib64/libpthread.so.0 11 Thread 0x7f8b597fc700 (LWP 10195) 0x00007f8ba846a054 in __lll_lock_wait () from /lib64/libpthread.so.0 10 Thread 0x7f8b70b37700 (LWP 28996) 0x00007f8ba846a054 in __lll_lock_wait () from /lib64/libpthread.so.0 9 Thread 0x7f8b58ffb700 (LWP 8763) 0x00007f8ba846a054 in __lll_lock_wait () from /lib64/libpthread.so.0 8 Thread 0x7f8b4bfff700 (LWP 28614) 0x00007f8ba846a054 in __lll_lock_wait () from /lib64/libpthread.so.0 7 Thread 0x7f8b4b7fe700 (LWP 23367) 0x00007f8ba846a054 in __lll_lock_wait () from /lib64/libpthread.so.0 6 Thread 0x7f8b4affd700 (LWP 13359) 0x00007f8ba846a054 in __lll_lock_wait () from /lib64/libpthread.so.0 5 Thread 0x7f8b4a7fc700 (LWP 29064) 0x00007f8ba846a054 in __lll_lock_wait () from /lib64/libpthread.so.0 4 Thread 0x7f8b49ffb700 (LWP 2184) 0x00007f8ba846a054 in __lll_lock_wait () from /lib64/libpthread.so.0 3 Thread 0x7f8b497fa700 (LWP 31175) 0x00007f8ba846a054 in __lll_lock_wait () from /lib64/libpthread.so.0 2 Thread 0x7f8b48ff9700 (LWP 2316) 0x00007f8ba846a054 in __lll_lock_wait () from /lib64/libpthread.so.0 * 1 Thread 0x7f8baa8db700 (LWP 31615) 0x00007f8ba84640ad in pthread_join () from /lib64/libpthread.so.0 (gdb) bt #0 0x00007f8ba84640ad in pthread_join () from /lib64/libpthread.so.0 #1 0x00007f8baa93c969 in slapd_daemon () at ../../%./servers/slapd/daemon.c:2929 #2 0x00007f8baa927733 in main (argc=6, argv=<value optimized out>) at ../../../servers/slapd/main.c:1012 (gdb) thread 2 [Switching to thread 2 (Thread 0x7f8b48ff9700 (LWP 2316))]#0 0x00007f8ba846a054 in __lll_lock_wait (929 from /lib64/libpthread.so.0 (gdb) bt #0 0x00007f8ba846a054 in __lll_lock_wait () from /lib64/libpthread.so.0 #1 0x00007f8ba8465388 in _L_lock_854 () from /lib64/libpthread.so.0 #2 0x00007f8ba8465257 in pthread_mutex_lock () from /lib64/libpthread.so.0 #3 0x00007f8baa4b96a0 in tlsm_session_accept_or_connect (session=0x7f8b2c104110, is_accept=<value optimized out>) at tls_m.c:2656 #4 0x00007f8baa4b8787 in ldap_pvt_tls_accept (sb=0x7f8b2c104150, ctx_arg=0x7f8bac6f8230) at tls2.c:425 #5 0x00007f8baa943f23 in connection_read (ctx=0x7f8b48ff8b70, argv=0x2f) at ../../../servers/slapd/connection.c:1372 #6 connection_read_thread (ctx=0x7f8b48ff8b70, argv=0x2f) at ../../../servers/slapd/connection.c:1284 #7 0x00007f8baa490a98 in ldap_int_thread_pool_wrapper (xpool=0x7f8bac5c8fb0) at ../../../libraries/libldap_r/tpool.c:688 #8 0x00007f8ba8463851 in start_thread () from /lib64/libpthread.so.0 #9 0x00007f8ba7fa690d in clone () from /lib64/libc.so.6 (gdb) thread 14 [Switching to thread 14 (Thread 0x7f8b79809700 (LWP 31625))]#0 0x00007f8ba846a54d in read () from /lib64/libpthread.so.0 (gdb) bt #0 0x00007f8ba846a54d in read () from /lib64/libpthread.so.0 #1 0x00007f8baa278b1c in sb_debug_read (sbiod=0x7f8b681def80, buf=0x7f8b6819c830, len=5) at ../../../libraries/liblber/sockbuf.c:829 #2 0x00007f8baa4b9984 in tlsm_PR_Recv (fd=<value optimized out>, buf=0x7f8b6819c830, len=5, flags=<value optimized out>, timeout=<value optimized out>) at tls_m.c:3007 #3 0x00007f8ba946c1ed in ?? () from /usr/lib64/libssl3.so #4 0000007f8ba9467480 in ?? () from /usr/lib64/libssl3.so #5 0x00007f8ba9469ed2 in ?? () from /usr/lib64/libssl3.so #6 0x00007f8ba9470135 in ?? () from /usr/lib64/libssl3.so #7 0x00007f8ba947196f in SSL_ForceHandshake () from /usr/lib64/libssl3o%o #8 0x00007f8baa4b96a8 in tlsm_session_accept_or_connect (session=0x7f8b681b0f20, is_accept=<value optimized out>) at tls_m.c:2658 #9 0x00007f8baa4b82e2 in ldap_int_tls_connect (ld=0x7f8b681d9250, conn=<value optimized out>) at tls2.c:362 #10 0x00007f8baa4b857inin ldap_int_tls_start (ld=0x7f8b681d9250, conn=0x7f8b681b0ba0, srv=<value optimized out>) at tls2.c:860 #11 0x00007f8baa4b86ce in ldap_start_tls_s (ld=0x7f8b681d9250, serverctrls=0x0, clientctrls=0x0) at tls2.c:1040 #12 0x00007f8baa937ff6 in slap_client_connect (ldp=0x7f8bac6d35a8, sb=0x7f8bac6d3380) at ../../../servers/slapd/config.c:2012 #13 0x00007f8baa9ae3f5 in do_syncrep1 (ctx=<value optimized out>, arg=0x7f8bac6cc430) at ../../../servers/slapd/syncrepl.c:613 #14 do_syncrepl (ctx=<value optimized out>, arg=0x7f8bac6cc430) at ../../../servers/slapd/syncrepl.c:1527 #15 0x00007f8baa490a98 in ldap_int_thread_pool_wrapper (xpool=0x7f8bac5c8fb0) at ../../../libraries/libldap_r/tpool.c:688 #16 0x00007f8ba8463851 in start_thread () from /lib64/libpthread.so.0 #17 0x00007f8ba7fa690d in clone () from /lib64/libc.so.6 (gdb) thread 18 [Switching to thread 18 (Thread 0x7f8b7b80d700 (LWP 31620))]#0 0x00007f8ba7fa6f03 in epoll_wait () from /lib64/libc.so.6 (gdb) bt #0 0x00007f8ba7fa6f03 in epoll_wait () from /lib64/libc.so.6 #1 0x00007f8baa93d9e2 in slapd_daemon_task (ptr=<value optimized out>) at ../../../servers/slapd/daemon.c:2536 #2 0x00007f8ba8463851 in start_thread () from /lib64/libpthread.so.0 #3 0x00007f8ba7fa690d in clone () from /lib64/libc.so.6
GDB on host2:
(gdb) info threads 7 Thread 0x7f5c942d5700 (LWP 3869) 0x00007f5cc0797f03 in epoll_wait () from /lib64/libc.so.6 6 Thread 0x7f5c93ad4700 (LWP 3870) 0x00007f5cc0c5b54d in read () from /lib64/libpthread.so.0 5 Thread 0x7f5c932d3700 (LWP 3871) 0x00007f5cc0c5b054 in __lll_lock_wait () from /lib64/libpthread.so.0 4 Thread 0x7f5c92ad2700 (LWP 3872) 0x00007f5cc0c5843c in pthread_cond_wait@@GLIBC_2.3.2 () from /lib64/libpthread.so.0 3 Thread 0x7f5c922d1700 (LWP 3873) 0x00007f5cc0c5843c in pthread_cond_wait@@GLIBC_2.3.2 () from /lib64/libpthread.so.0 2 Thread 0x7f5c91ad0700 (LWP 3874) 0x00007f5cc0c5843c in pthread_cond_wait@@GLIBC_2.3.2 () from /lib64/libpthread.so.0 * 1 Thread 0x7f5cc30cc700 (LWP 3868) 0x00007f5cc0c550ad in pthread_join () from /lib64/libpthread.so.0 (gdb) bt #0 0x00007f5cc0c550ad in pthread_join () from /lib64/libpthread.so.0 #1 0x00007f5cc312d969 in slapd_daemon () at ../../../servers/slapd/daemon.c:2929 #2 0x00007f5cc3118733 in main (argc=6, argv=<value optimized out>) at ../../../servers/slapd/main.c:1012 (gdb) thread 2 [Switching to thread 2 (Thread 0x7f5c91ad0700 (LWP 3874))]#0 0x00007f5cc0c5843c in pthread_cond_wait@@GLIBC_2.3.2 () from /lib64/libpthread.so.0 (gdb) bt #0 0x00007f5cc0c5843c in pthread_cond_wait@@GLIBC_2.3.2 () from /lib64/libpthread.so.0 #1 0x00007f5cc2c81af5 in ldap_int_thread_pool_wrapper (xpool=0x7f5cc3d18030) at ../../../libraries/libldap_r/tpool.c:675 #2 0x00007f5cc0c54851 in start_thread () from /lib64/libpthread.so.0 #3 0x00007f5cc079790d in clone () from /lib64/libc.so.6 (gdb) thread 5 [Switching to thread 5 (Thread 0x7f5c932d3700 (LWP 3871))]#0 0x00007f5cc0c5b054 in __lll_lock_wait () from /lib64/libpthread.so.0 (gdb) bt #0 0x00007f5cc0c5b054 in __lll_lock_wait () from /lib64/libpthread.so.0 #1 0x00007f5cc0c56388 in _L_lock_854 () from /lib64/libpthread.so.0 #2 0x00007f5cc0c56257 in pthread_mutex_lock () from /lib64/libpthread.so.0 #3 0x00007f5cc2caa6a0 in tlsm_session_accept_or_connect (session=0x7f5c841165c0, is_accept=<value optimized out>) at tls_m.c:2656 #4 0x00007f5cc2ca9787 in ldap_pvt_tls_accept (sb=0x7f5c800008c0, ctx_arg=0x7f5cc3e3b240) at tls2.c:425 #5 0x00007f5cc3134f23 in connection_read (ctx=0x7f5c932d2b70, argv=0x10) at ../../../servers/slapd/connection.c:1372 #6 connection_read_thread (ctx=0x7f5c932d2b70, argv=0x10) at ../../../servers/slapd/connection.c:1284 #7 0x00007f5cc2c81a98 in ldap_int_thread_pool_wrapper (xpool=0x7f5cc3d18030) at ../../../libraries/libldap_r%tptpool.c:688 #8 0x00007f5cc0c54851 in start_thread () from /lib64/libpthread.so.0 #9 0x00007f5cc079790d in clone () from /lib64/libc.so.6 (gdb) thread 6 [Switching to thread 6 (Thread 0x7f5c93ad4700 (LWP 3870))]#0 0x00007f5cc0c5b54d in read () from /lib64/libpthread.so.0 (gdb) bt #0 0x00007f5cc0c5b54d in read () from /lib64/libpthread.so.0 #1 0x00007f5cc2a69b1c in sb_debug_read (sbiod=0x7f5c7c185e10, buf=0x7f5c7c18a7f0, len=5) at ../../../libraries/liblber/sockbuf.c:829 #2 0x00007f5cc2caa984 in tlsm_PR_Recv (fd=<value optimized out>, buf=0x7f5c7c18a7f0, len=5, flags=<value optimized out>, timeout=<value optimized out>) at tls_m.c:3007 #3 0x00007f5cc1c5d1ed in ?? () from /usr/lib64/libssl3.so #4 0x00007f5cc1c58480 in ?? () from /usr/lib64/libssl3.so #5 0x00007f5cc1c5aed2 in ?? () from /usr/lib64/libssl3.so #6 0x00007f5cc1c61135 in ?? () from /usr/lib64/libssl3.so #7 0x00007f5cc1c6296f in SSL_ForceHandshake () from /usr/lib64/libssl3.so #8 0x00007f5cc2caa6a8 in tlsm_session_accept_or_connect (session=0x7f5c7c182100, is_accept=<value optimized out>) at tls_m.c:2658 #9 0x00007f5cc2ca92e2 in ldap_int_tls_connect (ld=0x7f5c7c100910, conn=<value optimized out>) at tls2.c:362 #10 0x00007f5cc2ca9574 in ldap_int_tls_start (ld=0x7f5c7c100910, conn=0x7f5c7c10a020, srv=<value optimized out>) at tls2.c:860 #11 0x00007f5cc2ca96ce in ldap_start_tls_s (ld=0x7f5c7c100910, serverctrls=0x0, clientctrls=0x0) at tls2.c:1040 #12 0x00007f5cc3128ff6 in slap_client_connect (ldp=0x7f5cc3e22558, sb=0x7f5cc3e22330) at ../../../servers/slapd/config.c:2012 #13 0x00007f5cc319f3f5 in do_syncrep1 (ctx=<value optimized out>, arg=0x7f5cc3e1f410) at ../../../servers/slapd/syncrepl.c:613 #14 do_syncrepl (ctx=<value optimized out>, arg=0x7f5cc3e1f410) at ../../../servers/slapd/syncrepl.c:1527 #15 0x00007f5cc2c81a98 in ldap_int_thread_pool_wrapper (xpool=0x7f5cc3d18030) at ../../../libraries/libldap_r/tpool.c:688 #16 0x00007f5cc0c54851 in start_thread () from /lib64/libpthread.so.0 #17 0x00007f5cc079790d in clone () from /lib64/libc.so.6 (gdb) thread 7 [Switching to thread 7 (Thread 0x7f5c942d5700 (LWP 3869))]#0 0x00007f5cc0797f03 in epoll_wait () from /lib64/libc.so.6 (gdb) bt #0 0x00007f5cc0797f03 in epoll_wait () from /lib64/libc.so.6 #1 0x00007f5cc312e9e2 in slapd_daemon_task (ptr=<value optimized out>) at ../../../servers/slapd/daemon.c:2536 #2 0x00007f5cc0c54851 in start_thread () from /lib64/libpthread.so.0 #3 0x00007f5cc079790d in clone () from /lib64/libc.so.6
Relevant configuration information (from host2):
syncrepl rid=001 provider=ldap://host1/ type=refreshAndPersist retry="10 +" searchbase="dc=example,dc=com" starttls=critical tls_cacert=/etc/openldap/ssl-certs/cacert.pem tls_cert=/etc/openldap/ssl-certs/host1.crt tls_key=/etc/openldap/ssl-certs/host1.key tls_reqcert=demand bindmethod=sasl saslmech=EXTERNAL sizelimit=unlimited keepalive="10:2:10" logfilter="(&(objectClass=auditWriteObject)(reqResult=0))" logbase="cn=accesslog" syncdata=accesslog
Netstat of host1:
[root@host1 ~]# netstat -anp | grep slapd tcp 0 0 0.0.0.0:389 0.0.0.0:* LISTEN 31615/slapd tcp 56 0 192.168.1.1:389 192.168.1.2:57276 ESTABLISHED 31615/slapd tcp 0 0 192.168.1.1:45569 192.168.1.2:389 ESTABLISHED 31615/slapd unix 2 [ ] STREAM CONNECTED 45346152 31615/slapd unix 2 [ ] DGRAM 45346144 31615/slapd
Netstat of host2:
[root@host2 ~]# netstat -anp | grep slapd tcp 0 0 0.0.0.0:389 0.0.0.0:* LISTEN 3868/slapd tcp 56 0 192.168.1.2:389 192.168.1.1:45569 ESTABLISHED 3868/slapd tcp 0 0 192.168.1.2:57276 192.168.1.1:389 ESTABLISHED 3868/slapd unix 2 [ ] STREAM CONNECTED 19144 3868/slapd unix [ ] DGRAM 19136 3868/slapd