Re: Spurious Start TLS failed errors on proxyed bind OpenLDAP 2.4.40

17 Jan 2019


      Janne Peltonen wrote:
...
Hi!
We've got a proxy OpenLDAP setup using the "ldap" proxy database (the newer
"meta" backend had some issues, so I couldn't use that one). Most of the time,
the proxy bind goes OK, but sometimes, the connection goes like this (proxy
log, using ldaps):
--clip--
Jan  7 16:01:12 ldap-r-1 slapd[16851]: conn=7716 fd=96 ACCEPT from IP=128.214.54.152:36460 (IP=0.0.0.0:636)
Jan  7 16:01:12 ldap-r-1 slapd[16851]: conn=7716 fd=96 TLS established tls_ssf=256 ssf=256
Jan  7 16:01:12 ldap-r-1 slapd[16851]: conn=7716 op=0 BIND dn="<removed>" method=128
Jan  7 16:01:12 ldap-r-1 slapd[16851]: conn=7716 op=0 RESULT tag=97 err=52 text=Start TLS failed
Jan  7 16:01:12 ldap-r-1 slapd[16851]: conn=7716 fd=96 closed (connection lost)
--clip--
On the backend, this appears to be the case that a new connection kind of
overruns the old one; see what happens with connection 6983:
Looks odd indeed, but doesn't seem related to the other errors. Would need much finer
resolution timestamps to correlate what's going on, unless you know there are no other
active connections on the proxy when this occurred on the backend.
...
--clip--
Jan  7 16:01:12 ldap-v-2 slapd[61739]: conn=6983 fd=23 ACCEPT from IP=128.214.222.155:46144 (IP=0.0.0.0:389)
Jan  7 16:01:12 ldap-v-2 slapd[61739]: conn=6984 fd=41 ACCEPT from IP=128.214.222.155:46146 (IP=0.0.0.0:389)
Jan  7 16:01:12 ldap-v-2 slapd[61739]: conn=6983 op=0 EXT oid=1.3.6.1.4.1.1466.20037
Jan  7 16:01:12 ldap-v-2 slapd[61739]: conn=6983 op=0 STARTTLS
Jan  7 16:01:12 ldap-v-2 slapd[61739]: conn=6983 op=1 UNBIND
Jan  7 16:01:12 ldap-v-2 slapd[61739]: conn=6983 fd=23 closed
Jan  7 16:01:12 ldap-v-2 slapd[61739]: conn=6984 op=0 EXT oid=1.3.6.1.4.1.1466.20037
Jan  7 16:01:12 ldap-v-2 slapd[61739]: conn=6984 op=0 STARTTLS
Jan  7 16:01:12 ldap-v-2 slapd[61739]: conn=6984 op=0 RESULT oid= err=0 text=
Jan  7 16:01:12 ldap-v-2 slapd[61739]: conn=6984 fd=41 TLS established tls_ssf=256 ssf=256
Jan  7 16:01:12 ldap-v-2 slapd[61739]: conn=6984 op=1 BIND dn="<removed>" method=128
Jan  7 16:01:12 ldap-v-2 slapd[61739]: conn=6984 op=1 BIND dn="<removed>" mech=SIMPLE ssf=0
Jan  7 16:01:12 ldap-v-2 slapd[61739]: conn=6984 op=1 RESULT tag=97 err=0 text=
[SRCH and result lines]
Jan  7 16:01:12 ldap-v-2 slapd[61739]: conn=6984 op=3 UNBIND
Jan  7 16:01:12 ldap-v-2 slapd[61739]: conn=6984 fd=41 closed
--clip--
This doesn't seem to have any correlation to the amount of binds of server
load, at least that I'm aware - and anyway, the servers are nowhere near at
capacity - we only get ~2 binds per second.
Normal connection would look like this:
--clip--
Jan  7 16:00:55 ldap-r-1 slapd[16851]: conn=7650 fd=159 ACCEPT from IP=128.214.54.152:36362 (IP=0.0.0.0:636)
Jan  7 16:00:55 ldap-r-1 slapd[16851]: conn=7650 fd=159 TLS established tls_ssf=256 ssf=256
Jan  7 16:00:55 ldap-r-1 slapd[16851]: conn=7650 op=0 BIND dn="<removed>" method=128
Jan  7 16:00:55 ldap-r-1 slapd[16851]: conn=7650 op=0 BIND dn="<removed>" mech=SIMPLE ssf=0
Jan  7 16:00:55 ldap-r-1 slapd[16851]: conn=7650 op=0 RESULT tag=97 err=0 text=
[SRCH and SEARCH RESULT lines]
Jan  7 16:01:12 ldap-r-1 slapd[16851]: conn=7650 op=219 UNBIND
Jan  7 16:01:12 ldap-r-1 slapd[16851]: conn=7650 fd=159 closed
--clip--
So this far, it looks as if the problem was that if we accept a second
connection for the same client before the proxy has had time to do a bind (or
indeed, a STARTTLS for the bind) on the backend for the second connection.
But we seem to be getting spurious Start TLS failed messages also without any
competing connections. Here's one using ldap+STARTTLS but no other ACCEPTs
anywhere near:
These aren't spurious - your TLS library has genuinely failed to start a session. Which TLS library
are you using? What OS are you running on? The most common cause for periodic failures is
running out of entropy for the PRNG.
...
--clip--
Jan 17 17:46:20 ldap-r-1 slapd[28738]: conn=529684 fd=62 ACCEPT from IP=128.214.173.135:43148 (IP=0.0.0.0:389)
Jan 17 17:46:20 ldap-r-1 slapd[28738]: conn=529684 op=0 EXT oid=1.3.6.1.4.1.1466.20037
Jan 17 17:46:20 ldap-r-1 slapd[28738]: conn=529684 op=0 STARTTLS
Jan 17 17:46:20 ldap-r-1 slapd[28738]: conn=529684 op=0 RESULT oid= err=0 text=
Jan 17 17:46:20 ldap-r-1 slapd[28738]: conn=529684 fd=62 TLS established tls_ssf=256 ssf=256
Jan 17 17:46:20 ldap-r-1 slapd[28738]: conn=529684 op=1 BIND dn="ou=shibboleth,ou=login,dc=helsinki,dc=fi" method=128
Jan 17 17:46:20 ldap-r-1 slapd[28738]: conn=529684 op=1 RESULT tag=97 err=52 text=Start TLS failed
Jan 17 17:46:22 ldap-r-1 slapd[28738]: conn=529684 fd=62 closed (connection lost)
--clip--
on the proxy, and backend:
--clip--
Jan 17 17:46:20 ldap-v-2 slapd[40089]: conn=406149 fd=92 ACCEPT from IP=128.214.222.155:48996 (IP=0.0.0.0:389)
Jan 17 17:46:20 ldap-v-2 slapd[40089]: conn=406149 op=0 EXT oid=1.3.6.1.4.1.1466.20037
Jan 17 17:46:20 ldap-v-2 slapd[40089]: conn=406149 op=0 STARTTLS
Jan 17 17:46:20 ldap-v-2 slapd[40089]: conn=406149 op=0 RESULT oid= err=0 text=
Jan 17 17:46:20 ldap-v-2 slapd[40089]: conn=406149 fd=92 closed (TLS negotiation failure)
--clip--
...
Any ideas what could be causing these? It's a bit hard to pinpoint the problem,
since it happens so rarely - we seem to get a couple dozen of these every hour
to around 16 000 succesful binds, not really depending on system load. This is
RHEL 7 and OpenLDAP 2.4.40.
It may be the case that the same DN binding very often increases the
probability of a Start TLS failed error, which I guess would be the first case
(a new ACCEPT + BIND for a DN too soon after the previous one). But as I said,
these seem to also sometimes appear even without any other connections being
opened at the same time. (Although there might be some old connections open for
the same DN.
-- 
  -- Howard Chu
  CTO, Symas Corp.           http://www.symas.com
  Director, Highland Sun     http://highlandsun.com/hyc/
  Chief Architect, OpenLDAP  http://www.openldap.org/project/

2025

2024

2023

2022

2021

2020

2019

2018

2017

2016

2015

2014

2013

2012

2011

2010

2009

2008

2007

Re: Spurious Start TLS failed errors on proxyed bind OpenLDAP 2.4.40