slapd connection_read: no connection; tcp time_wait state - openldap-technical

17 Feb 2009


      Hi!
This is an interesting one...  I have an OpenLDAP 2.4.12 server as a 
consumer in a two node cluster.  It's sole function is to answer queries 
for our mail hub for recipient validation.  We see about 50-300 queries 
/ second and occasional spikes.
Unfortunately, our mail hub appliances (vendor name left out to protect 
the guilty) are somewhat inefficient in ldap connection handling and are 
opening a new TCP connection for every single ldap query.  It does this 
even when there are multiple recipients in one smtp session (boggles the 
mind!).  A percentage of these connections don't get closed properly and 
I get the following error in the syslog:
slapd[23108]: connection_read(18): no connection!
The reason is that the connections are in a time_wait state because they 
were not closed properly.  They go away in 60 seconds, but with the load 
this server gets we continuously have several hundred tcp connections in 
a time_wait state and a system log full of the above errors.
I'm attaching two packet captures:
time_wait.cap - filtered a single complete tcp session that ended with 
the port in a time_wait condition.
no_time_wait.cap - control capture for reference.  This session closed 
properly.
I can't claim to have the greatest understanding of 3-way / 4-way tcp 
open / close handshakes.  But, one thing that I did notice that seems to 
be consistent among the sessions that end in time_wait is that the 
fin-ack is initiated by the server.  Possibly i'm reading it wrong, but 
doesn't the client normally initiate the close?  and the server does a 
passive close?   So, in theory the server should never have to wait for 
the client.
Could someone more knowledgeable than me tell me why the server might 
initiate the active close?
thanks,
-james