On Tuesday, 12 July 2011 11:59:52 Cyril GROSJEAN wrote:
I randomly notice my OpenLDAP server freezes, and I can't udnerstand why. I have a few LDAP clients (ldapsearch, a legacy Java app. and ApacheDirectoryStudio), running from different systems, either locally on the OpenLDAP server, or on another OpenLDAP server, or on a remote workstation, and none manages to get an answer from OpenLDAP. The connection is established but each client gets stuck waiting for any result.
[...]
Jul 12 10:20:05 dev-ldap1 slapd[28525]: connection_input: conn=3377 deferring operation: binding
This is the code (at least in 2.4.26) that generates the message:
/* Don't process requests when the conn is in the middle of a * Bind, or if it's closing. Also, don't let any single conn * use up all the available threads, and don't execute if we're * currently blocked on output. And don't execute if there are * already pending ops, let them go first. Abandon operations * get exceptions to some, but not all, cases. */ switch( tag ){ default: /* Abandon and Unbind are exempt from these checks */ if (conn->c_conn_state == SLAP_C_CLOSING) { defer = "closing"; break; } else if (conn->c_writewaiter) { defer = "awaiting write"; break; } else if (conn->c_n_ops_pending) { defer = "pending operations"; break; } /* FALLTHRU */ case LDAP_REQ_ABANDON: /* Unbind is exempt from these checks */ if (conn->c_n_ops_executing >= connection_pool_max/2) { defer = "too many executing"; break; } else if (conn->c_conn_state == SLAP_C_BINDING) { defer = "binding"; break; } /* FALLTHRU */ case LDAP_REQ_UNBIND: break; }
if( defer ) { int max = conn->c_dn.bv_len ? slap_conn_max_pending_auth : slap_conn_max_pending;
Debug( LDAP_DEBUG_ANY, "connection_input: conn=%lu deferring operation: %s\n", conn->c_connid, defer, 0 ); conn->c_n_ops_pending++; LDAP_STAILQ_INSERT_TAIL( &conn->c_pending_ops, op, o_next ); rc = ( conn->c_n_ops_pending > max ) ? -1 : 0;
} else {
... carry on and handle the op.
As far as I understand, the intention is to (among others) ignore operations from connections where a BIND operation is still pending. However, some of the comments now appear to be a bit misplaced (e.g. Unbind comment vs LDAP_REQ_ABANDON). Also, the code appears (to me, not being very familiar with it, and quite rusty at C) to not be doing the right thing. The portion generating the "deferring operation: binding" message appears to be when an abandon operation is received on a connection that has a pending BIND operation. Shouldn't an abandon be allowed for a BIND? Or, am I reading it wrong? Also, it looks as if the "too many executing" is also only applicable to abandon?
Shouldn't the LDAP_REQ_ABANDON case be breaking without setting 'defer'?
Shouldn't the 'conn->c_conn_state == SLAP_C_BINDING' and 'conn-
c_n_ops_executing >= connection_pool_max/2' conditions be handled by the
default case as well?
We have been running into both the "deferring: binding" and "deferring: too many executing" messages, but I hadn't had time to trace what the LDAP client software was doing, but now I wonder if maybe it was sending abandon requests when some operations weren't returning in time (after > 18000 successful operations on a connection. I think its behaviour regarding its use of LDAP connections may be wrong, but I would prefer to be able to prove that its behaviour is wrong to the vendor without other log entries that show its correct behaviour being handled incorrectly.
Also, the hard-coded 'one connection may not use more pending operations than half the number of threads' rule seems a bit arbitrary. Could we get a knob to twiddle this?
Regards, Buchan