Hello,
I’m working on this bug:
http://www.openldap.org/lists/openldap-bugs/201206/msg00026.html
If slapd client connections are torn down in mid-query -- the server has received the query and has a pending reply to send, but the connection is closed by the client before it can be sent -- this deadlocks slapd worker threads. Eventually all threads are deadlocked in send_ldap_ber() which serializes their network access to send PDUs, and the server becomes unresponsive and has to be killed.
send_ldap_ber() notices the connection drop and calls connection_closing(). The problem appears to be that then connection_abandon() abandons all outstanding executing ops, but does not empty the c_ops queue (as it does with c_pending_ops). When connection_close() looks at the connection, it always sees there are outstanding ops and defers the close. I see this pattern:
50cb3104 connection_closing: readying conn=1519 sd=33 for close
50cb3104 connection_close: deferring conn=1519 sd=33 50cb3104 connection_resched: attempting closing conn=1519 sd=33 50cb3104 connection_close: deferring conn=1519 sd=33 50cb3104 connection_resched: attempting closing conn=1519 sd=33
... which repeats until the server freezes entirely.
If I add code to connection_abandon() to empty c_ops, it causes slapd to crash later with a mutex usage error, so that's apparently not the right place/way to do it. If I note that the connection is dying and have connection_destroy() skip the assertion that c_ops must be empty, it fixes the bug: the deadlock no longer occurs. However, I'm concerned this will leak memory as the ops aren't being freed. So my question is: what's the right way to get the outstanding executing ops abandoned by connection_abandon() to be freed?
The code is complex and I may have misunderstood how best to go about fixing this, but hopefully this is enough to make sense.
Thanks,