Hello folks, I've never used a jitterbug, so I hope this email goes to
the right place and doesn't spam up the system.
I've been experiencing the same problem as Mark. I use OpenLDAP 2.4.23
packaged in Debian. I have a few processes that connect to ldapi:///
socket and do anonymous binds, there are one or two "no connection!" log
messages each time these processes run. None of my remote connections
produce these messages. This happens at very low volume, when slapd is
idle. When slapd gets more utilization, maybe one or two requests per
second, the noise stops.
I've spent a few hours looking at the source of OpenLDAP slapd and
libldap, also Perl Net::LDAP (the client most often generating these log
entries on my system). The source code for OpenLDAP looks very clean,
but I'm no C expert, and I'm lost in the asynchronous nature of the
connection.c. FYI, Net::LDAP doesn't use libldap - it's all Perl, no
native libraries.
After hundreds of tests and various olcLogLevel settings I've finally
stumbled upon something that solves the problem...
usleep(1);
I know this has to be the lamest bug report ever, but it works, please
bear with me.
The first thing that tipped me to try 'usleep' was that with olcLogLevel
set to "trace packets args conns stats", I get the following output:
connection_get(14)
connection_get(14): got connid=1007
connection_read(14): checking for input on id=1007
* op tag 0x42, time 1283917507
* ber_get_next on fd 14 failed errno=0 (Success)
connection_read(14): input error=-2 id=1007, closing.
connection_closing: readying conn=1007 sd=14 for close
connection_close: deferring conn=1007 sd=14
conn=1007 op=2 do_unbind
conn=1007 op=2 UNBIND
connection_resched: attempting closing conn=1007 sd=14
connection_close: conn=1007 sd=14
daemon: removing 14
conn=1007 fd=14 closed
daemon: activity on 1 descriptor
daemon: activity on:
daemon: epoll: listen=8 active_threads=0 tvp=zero
daemon: epoll: listen=9 active_threads=0 tvp=zero
connection_get(14)
connection_get(14): connection not used
connection_get(14)
connection_get(14): connection not used
connection_read(14): no connection!
connection_read(14) error
connection_read(14): no connection!
The two lines I marked with asterisks are the output from the
connection_input function, "op tag 0x42..." should be printed after
"ber_get_next on fd 14 failed...", but it's not - so evidently this
function is reentrant, the input loop is processing both the UNBIND
request and the client closing the socket at the same time, and
unfortunately the socket close processing came first so that
connection_input thread tears down the connection.
Now with the perl scripts I tried adding a microsecond sleep just after
they unbind and before they exit, it worked - haven't seen any "no
connection!" logs from them for hours.
As for the libldap-client and ldapsearch commands, everytime I executed
a simple 'ldapsearch -x -H ldapi:///' I'd get two "no connection!" log
messages. I tried the same braindead usleep(1) solution and it worked.
The following is the diff of what I changed to libldap,
ldap_free_connection function in request.c:
if ( unbind ) {
ldap_send_unbind( ld, lc->lconn_sb, NULL, NULL );
+ Debug( LDAP_DEBUG_TRACE,
+ "ldap_free_connection: sleeping after unbind\n",
+ 0, 0, 0 );
+ usleep(1);
}
Here's the last bits of output from ldapsearch with -d3 after applying
this brutal "fix" to libldap:
ldap_free_connection 1 1
ldap_send_unbind
ber_flush2: 7 bytes to sd 3
ldap_write: want=7, written=7
0000: 30 05 02 01 03 42 00 0....B.
ldap_free_connection: sleeping after unbind
ldap_free_connection: actually freed
Notice the bytes getting flushed between unbind, my dumb sleep, and
"actually freed". Without the sleep it seems slapd may process the
close before the unbind.
--
Gerald Turner Email: gturner(a)unzane.com JID: gturner(a)jabber.unzane.com
GPG: 0xFA8CD6D5 21D9 B2E8 7FE7 F19E 5F7D 4D0C 3FA0 810F FA8C D6D5