On Wed, May 06, 2009 at 02:18:30AM -0700, Howard Chu wrote:
jwm@horde.net wrote:
Poked around a bit in the core:
(gdb) print c->c_writers $1 = -1 (gdb) print c->c_pending_ops->stqh_last[0] $5 = (struct Operation *) 0x0 (gdb) print c->c_n_ops_pending $6 = 0
So there are no pending ops on this connection, but c_writers == -1 indicates that one blocked writer remains to notice that the connection has been closed.
I looked at send_ldap_ber() and can't immediately find fault with its manipulation of c_writers. connection_closing() wakes up all blocked writers, so by the time connection_close() calls connection_destroy() and this assertion is checked, c_writers should be 0. Furthermore, c->c_conn_state is SLAP_C_CLOSING (0x4), which can only happen in connection_closing().
This is about as far as I can get; any other ideas?
I'm stumped. The only way out of connection_closing() is for c_writers to go to zero. And from your core file, there are no other threads blocked on any of the write mutexes or condition variables. You can reproduce this easily?
It's happened once or twice, but it hasn't had much time since this is the same machine I just got done chasing the i386 heap fragmentation on. Our backtrace/core/BDB harness is still on this slapd, so we should have information if it happens again.
john