Made some good progress on this one this evening.
The original issue this ITS is about is that gnutls_handshake() can, in some versions of GnuTLS, return GNUTLS_E_AGAIN even when the socket is blocking. Specifically, this happens in the case I described with a large CA list sent by the server.
For slapd, the patch I committed is unfortunately completely wrong. It has been using non-blocking sockets forever, EAGAIN is expected and handled robustly -- or it was, until I introduced the busy-loop.
For clients I'm still working on figuring out the right path forward. There is some EAGAIN handling conditional on LDAP_USE_NON_BLOCKING_TLS which itself is behind LDAP_DEVEL. However this code is meant for non-blocking sockets, and in my case it ends up stuck in poll() waiting for a notification that never arrives. In 2.4, ret == 1 simply falls into the success case and proceeds to send data without completing the handshake first.
It's possible that what I actually want here is a (ret > 0) case in ldap_int_tls_start for when LDAP_USE_NON_BLOCKING_TLS is absent and ldap_int_tls_connect returns 1. (I'd also need to adapt the non-blocking path to be able to handle a blocking socket as well.)
But it's also possible that gnutls_handshake() returning GNUTLS_E_AGAIN with a blocking socket is simply a GnuTLS bug that was introduced at some point. I still need to determine exactly when and why its behaviour changed. (It is still happening with 3.5.19.)
In any case, my patch has to be reverted, as its impact (making slapd busy-loop) is obviously worse than the status quo (misbehaving clients in a specific case). I have pushed that revert now and will continue digging as time permits.