Hello
I experience a wierd problem with OpenLDAP and TLS connexions. slapd will randomly reject connexions, with "TLS negotiation failure" error message.
That happens with various clients (MacOS, NetBSD, Linux), and it happens on multiples machines that run slapd, the current setup is below: OpenLDAP 2.4.16 OpenSSL 0.9.9-dev 09 May 2008 NetBSD 5.0.1
But the problem also existed before upgrades with NetBSD 4.0 and OpenLDAP 2.4.14. It seems to become worse over time.
Here is a trace obtained with a breakpoint set on the error message in slapd.
conn=0 fd=19 ACCEPT from IP=193.54.82.248:59782 (IP=193.54.82.23:636) TLS: can't accept: (null).
Breakpoint 1, connection_read (s=19, cri=0xa63ff8ac) at connection.c:1326 1326 connection_closing( c, "TLS negotiation failure" ); (gdb) bt #0 connection_read (s=19, cri=0xa63ff8ac) at connection.c:1326 #1 0x08078bf9 in connection_read_thread (ctx=0xa63ff900, argv=0x13) at connection.c:1216 #2 0xbbbaad3a in ldap_int_thread_pool_wrapper (xpool=0xbb540080) at tpool.c:663 #3 0xbb85e9df in pthread_create () from /usr/lib/libpthread.so.0 #4 0xbb7aa640 in swapcontext () from /usr/lib/libc.so.12 (gdb) c Continuing. conn=1 fd=20 ACCEPT from IP=193.54.82.248:59783 (IP=193.54.82.23:636) conn=0 fd=19 closed (TLS negotiation failure)
So connection_read() reports an error from ldap_pvt_tls_accept(), which is caused by tls_imp->ti_session_accept(). For OpenSSL, that is tlso_session_accept(), which just calls SSL_accept()
Does that ring a bell to anyone? Any suggestion for a workaround?
On Thu, Sep 10, 2009 at 07:06:59AM +0200, Emmanuel Dreyfus wrote:
So connection_read() reports an error from ldap_pvt_tls_accept(), which is caused by tls_imp->ti_session_accept(). For OpenSSL, that is tlso_session_accept(), which just calls SSL_accept()
I tried looping on SSL_accept() until it succeed, in tlso_session_accept(). It often has to try between 400 and 800 times before getting a success.
I suspect a locking issue, or a non blocking I/O thing..
On Thu, Sep 10, 2009 at 02:51:34PM +0000, Emmanuel Dreyfus wrote:
I tried looping on SSL_accept() until it succeed, in tlso_session_accept(). It often has to try between 400 and 800 times before getting a success.
The statement above should be discarded, as I overlooked a few things. Here is my latest analysis of the problem:
- here is the code path leading to the error: ldap_pct_tls_accepts -> tlso_session_accept -> SSL_accept
- During SSL_accept(), the tlso_info_cb() callback is invoked only oncen as reported by LDAP_DEBUG_TRACE output: TLS trace: SSL_accept:before/accept initialization There is no "TLS trace: SSL_accept:SSLv3 read client hello A", as we have in normal sessions.
- When SSL_accept() returns, it has: SSL_accept return value = 0 SSL_get_error() returns SSL_ERROR_SYSCALL ERR_get_error() returns 0 errno is set to 0.
- Reading SSL_get_error(3), I would be in the "EOF was observed that violates the procol" situation: SSL_ERROR_SYSCALL Some I/O error occurred. The OpenSSL error queue may contain more information on the error. If the error queue is empty (i.e. ERR_get_error() returns 0), ret can be used to find out more about the error: If ret == 0, an EOF was observed that violates the pro- tocol. If ret == -1, the underlying BIO reported an I/O error (for socket I/O on Unix systems, consult errno for details).
But I have trouble to claim the client is the culprit, since it happens wirh a rich mixture of clients: NetBSD, Linux, MacOS X.
Emmanuel Dreyfus manu@netbsd.org wrote:
- Reading SSL_get_error(3), I would be in the "EOF was observed that
violates the procol" situation: SSL_ERROR_SYSCALL Some I/O error occurred. The OpenSSL error queue may contain more information on the error. If the error queue is empty (i.e. ERR_get_error() returns 0), ret can be used to find out more about the error: If ret == 0, an EOF was observed that violates the pro- tocol. If ret == -1, the underlying BIO reported an I/O error (for socket I/O on Unix systems, consult errno for details).
ssldump tells me that the connexion is immedialty terminated by the client:
A connection, as reported by ssldump, that will exhibit "TLS negociation failure: New TCP connection #3: client (51203) <-> server (636) 3 0.0007 (0.0007) C>S TCP FIN 3 0.0014 (0.0007) S>C TCP FIN
A sane connextion; New TCP connection #4: client (51204) <-> server (636) 4 1 0.0007 (0.0007) C>S SSLv2 compatible client hello Version 3.1 cipher suites
Any idea of what could cause that?
openldap-software@openldap.org