Hi!
We've got a proxy OpenLDAP setup using the "ldap" proxy database (the newer "meta" backend had some issues, so I couldn't use that one). Most of the time, the proxy bind goes OK, but sometimes, the connection goes like this (proxy log, using ldaps):
--clip-- Jan 7 16:01:12 ldap-r-1 slapd[16851]: conn=7716 fd=96 ACCEPT from IP=128.214.54.152:36460 (IP=0.0.0.0:636) Jan 7 16:01:12 ldap-r-1 slapd[16851]: conn=7716 fd=96 TLS established tls_ssf=256 ssf=256 Jan 7 16:01:12 ldap-r-1 slapd[16851]: conn=7716 op=0 BIND dn="<removed>" method=128 Jan 7 16:01:12 ldap-r-1 slapd[16851]: conn=7716 op=0 RESULT tag=97 err=52 text=Start TLS failed Jan 7 16:01:12 ldap-r-1 slapd[16851]: conn=7716 fd=96 closed (connection lost) --clip--
On the backend, this appears to be the case that a new connection kind of overruns the old one; see what happens with connection 6983:
--clip-- Jan 7 16:01:12 ldap-v-2 slapd[61739]: conn=6983 fd=23 ACCEPT from IP=128.214.222.155:46144 (IP=0.0.0.0:389) Jan 7 16:01:12 ldap-v-2 slapd[61739]: conn=6984 fd=41 ACCEPT from IP=128.214.222.155:46146 (IP=0.0.0.0:389) Jan 7 16:01:12 ldap-v-2 slapd[61739]: conn=6983 op=0 EXT oid=1.3.6.1.4.1.1466.20037 Jan 7 16:01:12 ldap-v-2 slapd[61739]: conn=6983 op=0 STARTTLS Jan 7 16:01:12 ldap-v-2 slapd[61739]: conn=6983 op=1 UNBIND Jan 7 16:01:12 ldap-v-2 slapd[61739]: conn=6983 fd=23 closed Jan 7 16:01:12 ldap-v-2 slapd[61739]: conn=6984 op=0 EXT oid=1.3.6.1.4.1.1466.20037 Jan 7 16:01:12 ldap-v-2 slapd[61739]: conn=6984 op=0 STARTTLS Jan 7 16:01:12 ldap-v-2 slapd[61739]: conn=6984 op=0 RESULT oid= err=0 text= Jan 7 16:01:12 ldap-v-2 slapd[61739]: conn=6984 fd=41 TLS established tls_ssf=256 ssf=256 Jan 7 16:01:12 ldap-v-2 slapd[61739]: conn=6984 op=1 BIND dn="<removed>" method=128 Jan 7 16:01:12 ldap-v-2 slapd[61739]: conn=6984 op=1 BIND dn="<removed>" mech=SIMPLE ssf=0 Jan 7 16:01:12 ldap-v-2 slapd[61739]: conn=6984 op=1 RESULT tag=97 err=0 text= [SRCH and result lines] Jan 7 16:01:12 ldap-v-2 slapd[61739]: conn=6984 op=3 UNBIND Jan 7 16:01:12 ldap-v-2 slapd[61739]: conn=6984 fd=41 closed --clip--
This doesn't seem to have any correlation to the amount of binds of server load, at least that I'm aware - and anyway, the servers are nowhere near at capacity - we only get ~2 binds per second.
Normal connection would look like this:
--clip-- Jan 7 16:00:55 ldap-r-1 slapd[16851]: conn=7650 fd=159 ACCEPT from IP=128.214.54.152:36362 (IP=0.0.0.0:636) Jan 7 16:00:55 ldap-r-1 slapd[16851]: conn=7650 fd=159 TLS established tls_ssf=256 ssf=256 Jan 7 16:00:55 ldap-r-1 slapd[16851]: conn=7650 op=0 BIND dn="<removed>" method=128 Jan 7 16:00:55 ldap-r-1 slapd[16851]: conn=7650 op=0 BIND dn="<removed>" mech=SIMPLE ssf=0 Jan 7 16:00:55 ldap-r-1 slapd[16851]: conn=7650 op=0 RESULT tag=97 err=0 text= [SRCH and SEARCH RESULT lines] Jan 7 16:01:12 ldap-r-1 slapd[16851]: conn=7650 op=219 UNBIND Jan 7 16:01:12 ldap-r-1 slapd[16851]: conn=7650 fd=159 closed --clip--
So this far, it looks as if the problem was that if we accept a second connection for the same client before the proxy has had time to do a bind (or indeed, a STARTTLS for the bind) on the backend for the second connection.
But we seem to be getting spurious Start TLS failed messages also without any competing connections. Here's one using ldap+STARTTLS but no other ACCEPTs anywhere near:
--clip-- Jan 17 17:46:20 ldap-r-1 slapd[28738]: conn=529684 fd=62 ACCEPT from IP=128.214.173.135:43148 (IP=0.0.0.0:389) Jan 17 17:46:20 ldap-r-1 slapd[28738]: conn=529684 op=0 EXT oid=1.3.6.1.4.1.1466.20037 Jan 17 17:46:20 ldap-r-1 slapd[28738]: conn=529684 op=0 STARTTLS Jan 17 17:46:20 ldap-r-1 slapd[28738]: conn=529684 op=0 RESULT oid= err=0 text= Jan 17 17:46:20 ldap-r-1 slapd[28738]: conn=529684 fd=62 TLS established tls_ssf=256 ssf=256 Jan 17 17:46:20 ldap-r-1 slapd[28738]: conn=529684 op=1 BIND dn="ou=shibboleth,ou=login,dc=helsinki,dc=fi" method=128 Jan 17 17:46:20 ldap-r-1 slapd[28738]: conn=529684 op=1 RESULT tag=97 err=52 text=Start TLS failed Jan 17 17:46:22 ldap-r-1 slapd[28738]: conn=529684 fd=62 closed (connection lost) --clip--
on the proxy, and backend:
--clip-- Jan 17 17:46:20 ldap-v-2 slapd[40089]: conn=406149 fd=92 ACCEPT from IP=128.214.222.155:48996 (IP=0.0.0.0:389) Jan 17 17:46:20 ldap-v-2 slapd[40089]: conn=406149 op=0 EXT oid=1.3.6.1.4.1.1466.20037 Jan 17 17:46:20 ldap-v-2 slapd[40089]: conn=406149 op=0 STARTTLS Jan 17 17:46:20 ldap-v-2 slapd[40089]: conn=406149 op=0 RESULT oid= err=0 text= Jan 17 17:46:20 ldap-v-2 slapd[40089]: conn=406149 fd=92 closed (TLS negotiation failure) --clip--
Succesful ldap+starttls connection on proxy:
--clip-- Jan 17 18:11:01 ldap-r-1 slapd[28738]: conn=531181 fd=54 ACCEPT from IP=128.214.173.137:48700 (IP=0.0.0.0:389) Jan 17 18:11:01 ldap-r-1 slapd[28738]: conn=531181 op=0 EXT oid=1.3.6.1.4.1.1466.20037 Jan 17 18:11:01 ldap-r-1 slapd[28738]: conn=531181 op=0 STARTTLS Jan 17 18:11:01 ldap-r-1 slapd[28738]: conn=531181 op=0 RESULT oid= err=0 text= Jan 17 18:11:01 ldap-r-1 slapd[28738]: conn=531181 fd=54 TLS established tls_ssf=256 ssf=256 Jan 17 18:11:01 ldap-r-1 slapd[28738]: conn=531181 op=1 BIND dn="<removed>" method=128 Jan 17 18:11:01 ldap-r-1 slapd[28738]: conn=531181 op=1 BIND dn="<removed>" mech=SIMPLE ssf=0 Jan 17 18:11:01 ldap-r-1 slapd[28738]: conn=531181 op=1 RESULT tag=97 err=0 text= [SRCH and SEARCH RESULT lines] Jan 17 18:11:01 ldap-r-1 slapd[28738]: conn=531181 op=3 UNBIND Jan 17 18:11:01 ldap-r-1 slapd[28738]: conn=531181 fd=54 closed --clip--
Any ideas what could be causing these? It's a bit hard to pinpoint the problem, since it happens so rarely - we seem to get a couple dozen of these every hour to around 16 000 succesful binds, not really depending on system load. This is RHEL 7 and OpenLDAP 2.4.40.
It may be the case that the same DN binding very often increases the probability of a Start TLS failed error, which I guess would be the first case (a new ACCEPT + BIND for a DN too soon after the previous one). But as I said, these seem to also sometimes appear even without any other connections being opened at the same time. (Although there might be some old connections open for the same DN.)
--Janne Peltonen University of Helsinki