Re: (ITS#8650) EAGAIN from gnutls_handshake not respected - openldap-bugs

18 Sep 2018


      Made some good progress on this one this evening.
The original issue this ITS is about is that gnutls_handshake() can, in 
some versions of GnuTLS, return GNUTLS_E_AGAIN even when the socket is 
blocking. Specifically, this happens in the case I described with a 
large CA list sent by the server.
For slapd, the patch I committed is unfortunately completely wrong. It 
has been using non-blocking sockets forever, EAGAIN is expected and 
handled robustly -- or it was, until I introduced the busy-loop.
For clients I'm still working on figuring out the right path forward. 
There is some EAGAIN handling conditional on LDAP_USE_NON_BLOCKING_TLS 
which itself is behind LDAP_DEVEL. However this code is meant for 
non-blocking sockets, and in my case it ends up stuck in poll() waiting 
for a notification that never arrives. In 2.4, ret == 1 simply falls 
into the success case and proceeds to send data without completing the 
handshake first.
It's possible that what I actually want here is a (ret > 0) case in 
ldap_int_tls_start for when LDAP_USE_NON_BLOCKING_TLS is absent and 
ldap_int_tls_connect returns 1. (I'd also need to adapt the non-blocking 
path to be able to handle a blocking socket as well.)
But it's also possible that gnutls_handshake() returning GNUTLS_E_AGAIN 
with a blocking socket is simply a GnuTLS bug that was introduced at 
some point. I still need to determine exactly when and why its behaviour 
changed. (It is still happening with 3.5.19.)
In any case, my patch has to be reverted, as its impact (making slapd 
busy-loop) is obviously worse than the status quo (misbehaving clients 
in a specific case). I have pushed that revert now and will continue 
digging as time permits.