Hello,
I’m working on this bug:
http://www.openldap.org/lists/openldap-bugs/201206/msg00026.html
If slapd client connections are torn down in mid-query -- the server has received the query and has a pending reply to send, but the connection is closed by the client before it can be sent -- this deadlocks slapd worker threads. Eventually all threads are deadlocked in send_ldap_ber() which serializes their network access to send PDUs, and the server becomes unresponsive and has to be killed.
send_ldap_ber() notices the connection drop and calls connection_closing(). The problem appears to be that then connection_abandon() abandons all outstanding executing ops, but does not empty the c_ops queue (as it does with c_pending_ops). When connection_close() looks at the connection, it always sees there are outstanding ops and defers the close. I see this pattern:
50cb3104 connection_closing: readying conn=1519 sd=33 for close
50cb3104 connection_close: deferring conn=1519 sd=33 50cb3104 connection_resched: attempting closing conn=1519 sd=33 50cb3104 connection_close: deferring conn=1519 sd=33 50cb3104 connection_resched: attempting closing conn=1519 sd=33
... which repeats until the server freezes entirely.
If I add code to connection_abandon() to empty c_ops, it causes slapd to crash later with a mutex usage error, so that's apparently not the right place/way to do it. If I note that the connection is dying and have connection_destroy() skip the assertion that c_ops must be empty, it fixes the bug: the deadlock no longer occurs. However, I'm concerned this will leak memory as the ops aren't being freed. So my question is: what's the right way to get the outstanding executing ops abandoned by connection_abandon() to be freed?
The code is complex and I may have misunderstood how best to go about fixing this, but hopefully this is enough to make sense.
Thanks,
I was mistaken in my last message; my change did not actually fix the problem. My harness for reproducing the bug (taken from the original bug report) throws lots of cut connections at slapd and waits for the bug to manifest; it just so happened that it took a much longer time than usual for it to occur when I was testing that change.
Returning to the basic problem: when a connection is cut in mid-query as I described, this code in send_ldap_ber() notices the problem:
/* * we got an error. if it's ewouldblock, we need to * wait on the socket being writable. otherwise, figure * it's a hard error and return. */
Debug( LDAP_DEBUG_CONNS, "ber_flush2 failed errno=%d reason="%s"\n", err, sock_errstr(err), 0 );
if ( err != EWOULDBLOCK && err != EAGAIN ) { conn->c_writers--; conn->c_writing = 0; ldap_pvt_thread_mutex_unlock( &conn->c_write1_mutex ); connection_closing( conn, "connection lost on write" );
ldap_pvt_thread_mutex_unlock( &conn->c_mutex ); return -1; }
However, this is somehow ineffective, because shortly thereafter all the worker threads block with the following stack:
#0 pthread_cond_wait@@GLIBC_2.3.2 () at ../nptl/sysdeps/unix/sysv/linux/x86_64/pthread_cond_wait.S:162 #1 0x0000000000568f9a in ldap_int_thread_cond_wait (cond=0x7fe190fb9030, mutex=0x7fe190fb8fe0) at thr_posix.c:277 #2 0x000000000056aeee in ldap_pvt_thread_cond_wait (cond=0x7fe190fb9030, mutex=0x7fe190fb8fe0) at thr_debug.c:954 #3 0x000000000043e1b7 in send_ldap_ber (op=0x7fe160129fc0, ber=0x7fe16ae7b460) at result.c:303 #4 0x0000000000441338 in slap_send_search_entry (op=0x7fe160129fc0, rs=0x7fe16affc920) at result.c:1429 #5 0x00000000004c5953 in bdb_search (op=0x7fe160129fc0, rs=0x7fe16affc920) at search.c:1014 #6 0x000000000042daae in fe_op_search (op=0x7fe160129fc0, rs=0x7fe16affc920) at search.c:402 #7 0x000000000042d2f2 in do_search (op=0x7fe160129fc0, rs=0x7fe16affc920) at search.c:247 #8 0x000000000042a4b7 in connection_operation (ctx=0x7fe16affca00, arg_v=0x7fe160129fc0) at connection.c:1150 #9 0x00000000005679f3 in ldap_int_thread_pool_wrapper (xpool=0x1008e90) at tpool.c:688 #10 0x000000000056a493 in ldap_debug_thread_wrapper (arg=0x7fe1740015c0) at thr_debug.c:770 #11 0x00000039e3407851 in start_thread (arg=0x7fe16affd700) at pthread_create.c:301 #12 0x00000039e2ce811d in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:115
(This is with thread debugging enabled.) They are blocked here, again in send_ldap_ber():
while ( conn->c_writers > 0 && conn->c_writing ) { ldap_pvt_thread_cond_wait( &conn->c_write1_cv, &conn->c_write1_mutex ); }
The problem is that they are *all* here; there is no one left to signal that condition variable, so slapd locks up forever.
I’ve tried various modifications, but the code is quite complex, sparsely commented, and there’s no overall design document explaining how the various mutexes and condition variables are supposed to be used -- subtle concurrency bugs like this are extremely difficult to debug dynamically or reverse engineer, so I’m really shooting in the dark. Any help would be greatly appreciated.
Btw, this is not a theoretical problem: it caused us serious production issues. We were accessing slapd through an F5 load balancer: it would periodically max out its network link and drop some of the LDAP connections going through it, and this triggers the bug.
Thanks,
Richard Silverman wrote:
I was mistaken in my last message; my change did not actually fix the problem. My harness for reproducing the bug (taken from the original bug report) throws lots of cut connections at slapd and waits for the bug to manifest; it just so happened that it took a much longer time than usual for it to occur when I was testing that change.
It would probably be helpful for you to post your test code as well so other developers can use it to reproduce the issue.
Returning to the basic problem: when a connection is cut in mid-query as I described, this code in send_ldap_ber() notices the problem:
/* * we got an error. if it's ewouldblock, we need to * wait on the socket being writable. otherwise, figure * it's a hard error and return. */ Debug( LDAP_DEBUG_CONNS, "ber_flush2 failed errno=%d reason=\"%s\"\n", err, sock_errstr(err), 0 ); if ( err != EWOULDBLOCK && err != EAGAIN ) { conn->c_writers--; conn->c_writing = 0; ldap_pvt_thread_mutex_unlock( &conn->c_write1_mutex ); connection_closing( conn, "connection lost on write" ); ldap_pvt_thread_mutex_unlock( &conn->c_mutex ); return -1; }
However, this is somehow ineffective, because shortly thereafter all the worker threads block with the following stack:
#0 pthread_cond_wait@@GLIBC_2.3.2 () at ../nptl/sysdeps/unix/sysv/linux/x86_64/pthread_cond_wait.S:162 #1 0x0000000000568f9a in ldap_int_thread_cond_wait (cond=0x7fe190fb9030, mutex=0x7fe190fb8fe0) at thr_posix.c:277 #2 0x000000000056aeee in ldap_pvt_thread_cond_wait (cond=0x7fe190fb9030, mutex=0x7fe190fb8fe0) at thr_debug.c:954 #3 0x000000000043e1b7 in send_ldap_ber (op=0x7fe160129fc0, ber=0x7fe16ae7b460) at result.c:303 #4 0x0000000000441338 in slap_send_search_entry (op=0x7fe160129fc0, rs=0x7fe16affc920) at result.c:1429 #5 0x00000000004c5953 in bdb_search (op=0x7fe160129fc0, rs=0x7fe16affc920) at search.c:1014 #6 0x000000000042daae in fe_op_search (op=0x7fe160129fc0, rs=0x7fe16affc920) at search.c:402 #7 0x000000000042d2f2 in do_search (op=0x7fe160129fc0, rs=0x7fe16affc920) at search.c:247 #8 0x000000000042a4b7 in connection_operation (ctx=0x7fe16affca00, arg_v=0x7fe160129fc0) at connection.c:1150 #9 0x00000000005679f3 in ldap_int_thread_pool_wrapper (xpool=0x1008e90) at tpool.c:688 #10 0x000000000056a493 in ldap_debug_thread_wrapper (arg=0x7fe1740015c0) at thr_debug.c:770 #11 0x00000039e3407851 in start_thread (arg=0x7fe16affd700) at pthread_create.c:301 #12 0x00000039e2ce811d in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:115
(This is with thread debugging enabled.) They are blocked here, again in send_ldap_ber():
while ( conn->c_writers > 0 && conn->c_writing ) { ldap_pvt_thread_cond_wait( &conn->c_write1_cv, &conn->c_write1_mutex ); }
The problem is that they are *all* here; there is no one left to signal that condition variable, so slapd locks up forever.
The previously quoted code clearly sets conn->c_writing to 0 if a hard error occurs. What are the values in *conn when this hang happens? Which thread owns the write1 mutex at that point?
I’ve tried various modifications, but the code is quite complex, sparsely commented, and there’s no overall design document explaining how the various mutexes and condition variables are supposed to be used -- subtle concurrency bugs like this are extremely difficult to debug dynamically or reverse engineer, so I’m really shooting in the dark. Any help would be greatly appreciated.
Yes, the code here is in need of a rewrite. In particular, blocked writer threads should just perform their own epoll/select instead of delegating this back to the main listener thread(s). That would eliminate at least one layer of complexity.
As for the rest - "reverse engineering" would be something like taking a compiled binary and trying to decompile it. Reading source code is simply that, it's not reverse engineering. The variables are documented with their purpose and usage, the comments show what a block of code aims to do. What more do you expect?
Btw, this is not a theoretical problem: it caused us serious production issues. We were accessing slapd through an F5 load balancer: it would periodically max out its network link and drop some of the LDAP connections going through it, and this triggers the bug.
Richard Silverman wrote:
However, this is somehow ineffective, because shortly thereafter all the worker threads block with the following stack:
#0 pthread_cond_wait@@GLIBC_2.3.2 () at ../nptl/sysdeps/unix/sysv/linux/x86_64/pthread_cond_wait.S:162 #1 0x0000000000568f9a in ldap_int_thread_cond_wait (cond=0x7fe190fb9030, mutex=0x7fe190fb8fe0) at thr_posix.c:277 #2 0x000000000056aeee in ldap_pvt_thread_cond_wait (cond=0x7fe190fb9030, mutex=0x7fe190fb8fe0) at thr_debug.c:954 #3 0x000000000043e1b7 in send_ldap_ber (op=0x7fe160129fc0, ber=0x7fe16ae7b460) at result.c:303 #4 0x0000000000441338 in slap_send_search_entry (op=0x7fe160129fc0, rs=0x7fe16affc920) at result.c:1429 #5 0x00000000004c5953 in bdb_search (op=0x7fe160129fc0, rs=0x7fe16affc920) at search.c:1014 #6 0x000000000042daae in fe_op_search (op=0x7fe160129fc0, rs=0x7fe16affc920) at search.c:402 #7 0x000000000042d2f2 in do_search (op=0x7fe160129fc0, rs=0x7fe16affc920) at search.c:247 #8 0x000000000042a4b7 in connection_operation (ctx=0x7fe16affca00, arg_v=0x7fe160129fc0) at connection.c:1150 #9 0x00000000005679f3 in ldap_int_thread_pool_wrapper (xpool=0x1008e90) at tpool.c:688 #10 0x000000000056a493 in ldap_debug_thread_wrapper (arg=0x7fe1740015c0) at thr_debug.c:770 #11 0x00000039e3407851 in start_thread (arg=0x7fe16affd700) at pthread_create.c:301 #12 0x00000039e2ce811d in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:115
(This is with thread debugging enabled.) They are blocked here, again in send_ldap_ber():
while ( conn->c_writers > 0 && conn->c_writing ) { ldap_pvt_thread_cond_wait( &conn->c_write1_cv, &conn->c_write1_mutex ); }
The problem is that they are *all* here; there is no one left to signal that condition variable, so slapd locks up forever.
Please post the entire stack trace for all threads. Since we have c_writers = 8, c_writing = 1, c_writewaiter = 1
then one of the threads ought to be blocked on the write2_cv (result.c:373). That implies that they're still waiting for epoll() to say that the socket is writable.
Richard Silverman wrote:
Hello,
I’m working on this bug:
http://www.openldap.org/lists/openldap-bugs/201206/msg00026.html
If slapd client connections are torn down in mid-query -- the server has received the query and has a pending reply to send, but the connection is closed by the client before it can be sent -- this deadlocks slapd worker threads. Eventually all threads are deadlocked in send_ldap_ber() which serializes their network access to send PDUs, and the server becomes unresponsive and has to be killed.
send_ldap_ber() notices the connection drop and calls connection_closing(). The problem appears to be that then connection_abandon() abandons all outstanding executing ops, but does not empty the c_ops queue (as it does with c_pending_ops). When connection_close() looks at the connection, it always sees there are outstanding ops and defers the close. I see this pattern:
50cb3104 connection_closing: readying conn=1519 sd=33 for close
50cb3104 connection_close: deferring conn=1519 sd=33 50cb3104 connection_resched: attempting closing conn=1519 sd=33 50cb3104 connection_close: deferring conn=1519 sd=33 50cb3104 connection_resched: attempting closing conn=1519 sd=33
... which repeats until the server freezes entirely.
If I add code to connection_abandon() to empty c_ops, it causes slapd to crash later with a mutex usage error, so that's apparently not the right place/way to do it. If I note that the connection is dying and have connection_destroy() skip the assertion that c_ops must be empty, it fixes the bug: the deadlock no longer occurs. However, I'm concerned this will leak memory as the ops aren't being freed. So my question is: what's the right way to get the outstanding executing ops abandoned by connection_abandon() to be freed?
Looks to me like the correct solution is to add another check for op->o_abandon in send_ldap_ber() before waiting again on the write1_cv. Decrement c_writers and bail out if so.