Hello,
On Thu, 17 Jan 2019, Howard Chu wrote:
These aren't spurious - your TLS library has genuinely failed to start a session. Which TLS library are you using? What OS are you running on? The most common cause for periodic failures is running out of entropy for the PRNG.
Could it be because of too short timeout period? Please see the following code analysis.
OpenLDAP 2.4.46 on Fedora 29 back-ldap/bind.c has:
rs->sr_err = ldap_back_start_tls( ld, op->o_protocol, &is_tls, li->li_uri, flags, li->li_timeout[ SLAP_OP_EXTENDED ], &rs->sr_text ); li->li_uri_mutex_do_not_lock = 0; ldap_pvt_thread_mutex_unlock( &li->li_uri_mutex ); if ( rs->sr_err != LDAP_SUCCESS ) { ldap_unbind_ext( ld, NULL, NULL ); rs->sr_text = "Start TLS failed"; goto error_return;
So the call to:
ldap_back_start_tls( ld, op->o_protocol, &is_tls, li->li_uri, flags, li->li_timeout[ SLAP_OP_EXTENDED ], &rs->sr_text )
is the one that fails.
By the way, I am wondering here about the correctness of parameter
li->li_timeout[ SLAP_OP_EXTENDED ]
slap.h has:
/* * Operation indices */ typedef enum { SLAP_OP_BIND = 0, SLAP_OP_UNBIND, SLAP_OP_SEARCH, SLAP_OP_COMPARE, SLAP_OP_MODIFY, SLAP_OP_MODRDN, SLAP_OP_ADD, SLAP_OP_DELETE, SLAP_OP_ABANDON, SLAP_OP_EXTENDED, SLAP_OP_LAST } slap_op_t;
So SLAP_OP_EXTENDED has value 9. But back-ldap/config.c has:
/* see enum in slap.h */ static slap_cf_aux_table timeout_table[] = { { BER_BVC("bind="), SLAP_OP_BIND * sizeof( time_t ), 'u', 0, NULL }, /* unbind makes no sense */ { BER_BVC("add="), SLAP_OP_ADD * sizeof( time_t ), 'u', 0, NULL }, { BER_BVC("delete="), SLAP_OP_DELETE * sizeof( time_t ), 'u', 0, NULL }, { BER_BVC("modrdn="), SLAP_OP_MODRDN * sizeof( time_t ), 'u', 0, NULL }, { BER_BVC("modify="), SLAP_OP_MODIFY * sizeof( time_t ), 'u', 0, NULL }, { BER_BVC("compare="), SLAP_OP_COMPARE * sizeof( time_t ), 'u', 0, NULL }, { BER_BVC("search="), SLAP_OP_SEARCH * sizeof( time_t ), 'u', 0, NULL }, /* abandon makes little sense */ #if 0 /* not implemented yet */ { BER_BVC("extended="), SLAP_OP_EXTENDED * sizeof( time_t ), 'u', 0, NULL }, #endif { BER_BVNULL, 0, 0, 0, NULL } };
Hmmm, SLAP_OP_EXTENDED is not defined, so does li->li_timeout[ SLAP_OP_EXTENDED ] end up pointing outside of this timeout_table[]? I am not sure.
In any case, back-ldap/bind.c has:
static int ldap_back_start_tls( LDAP *ld, int protocol, int *is_tls, const char *url, unsigned flags, int timeout, const char **text ) {
...
The async code is compiled in because of a #define in back-ldap.h:
#ifdef SLAP_STARTTLS_ASYNCHRONOUS
...
if ( timeout ) { tv.tv_sec = timeout; tv.tv_usec = 0; } else { LDAP_BACK_TV_SET( &tv ); }
Now I assume (timeout == 0) and we go to the else-branch.
LDAP_BACK_TV_SET macro is:
#define LDAP_BACK_RESULT_TIMEOUT (0) #define LDAP_BACK_RESULT_UTIMEOUT (100000) #define LDAP_BACK_TV_SET(tv) \ do { \ (tv)->tv_sec = LDAP_BACK_RESULT_TIMEOUT; \ (tv)->tv_usec = LDAP_BACK_RESULT_UTIMEOUT; \ } while ( 0 )
My current guess is that the 0.1 second default timeout is set, and for some reason, the slapd backend TLS initialization sometimes takes longer than that, causing the TLS failure.
Do you think this is a possible scenario?
If so, as a temporary fix, would it be safe to hardcode, say 5 seconds for the timeout just for the TLS timeout else-branch?
Best regards, Unto Sten