https://bugs.openldap.org/show_bug.cgi?id=10023
Issue ID: 10023 Summary: Asynchronous connects are broken Product: OpenLDAP Version: 2.5.14 Hardware: All OS: All Status: UNCONFIRMED Keywords: needs_review Severity: normal Priority: --- Component: libraries Assignee: bugs@openldap.org Reporter: ipuleston@sonicwall.com Target Milestone: ---
We have a port of OpenLDAP client running in an embedded system, which is using asynchronous connects to the LDAP server. We have been using OpenLDAP 2.4.40 for a long time, and I just upgraded it to use 2.5.14 (as the current LTS release). After doing this, async connects to the LDAP server no longer work. You can see this in the following debug output:
A successful async connect with 2.4.40: ldap_send_initial_request ldap_new_connection 1 1 0 ldap_int_open_connection ldap_connect_to_host: TCP Ian-DC1.sd80.com:389 ldap_pvt_gethostbyname_a: host=Ian-DC1.sd80.com, r=0 ldap_new_socket: 251 ldap_prepare_socket: 251 ldap_connect_to_host: Trying 192.168.168.3:389 ldap_pvt_connect: fd: 251 tm: 10 async: -1 ldap_ndelay_on: 251 attempting to connect: connect errno: 115 ldap_int_poll: fd: -1 tm: 0
A failed async connect with 2.5.14: ldap_send_initial_request ldap_new_connection 1 1 0 ldap_int_open_connection ldap_connect_to_host: TCP Ian-DC1.sd80.com:389 ldap_pvt_gethostbyname_a: host=Ian-DC1.sd80.com, r=0 ldap_new_socket: 247 ldap_prepare_socket: 247 ldap_connect_to_host: Trying 10.21.61.3:389 ldap_pvt_connect: fd: 247 tm: 10 async: -1 ldap_ndelay_on: 247 attempting to connect: connect errno: 115 ldap_open_defconn: successful ldap_send_server_request Sending Bind Request, len=0x6ca10c1f ldap_write: want=63 error=Resource temporarily unavailable
Note that in both cases the connect attempt returns errno 115, EINPROGRESS, meaning that it has not completed. But after that:
● 2.4.40 calls ldap_int_poll (via ldap_send_initial_request -> ldap_int_check_async_open) to begin the wait for async completion.
● 2.5.14 instead reports a successful connect, and tries to send the bind which fails since thre socket is not yet connected.
I tracked the problem down to a change made for ITS #8022 "an async connect may still succeed immediately" in this commit: https://git.openldap.org/openldap/openldap/-/commit/ae6347bac12bbf843678a838...
That change in ldap_new_connection makes it set lconn_status for an async connect to LDAP_CONNST_CONNECTED rather than LDAP_CONNST_CONNECTING if ldap_int_open_connection returns 0. The problem is that ldap_int_open_connection returns 0 after getting the EINPROGRESS. ldap_connect_to_host returns -2 for the latter, but ldap_int_open_connection doesn't check for that, returning 0 for any return code other than -1.
I think that the bug is actually in ldap_int_open_connection rather than in the above commit. It should probably return -2 when ldap_connect_to_host returns that.
https://bugs.openldap.org/show_bug.cgi?id=10023
--- Comment #1 from ipuleston@sonicwall.com ipuleston@sonicwall.com --- Additionally I spotted an 2nd related issue which came in with this subsequent commit for ITS #8957:
https://git.openldap.org/openldap/openldap/-/commit/09ff530036a04a01ad4250ee...
Note that ticket reports the same "connect errno: 115" followed by "ldap_open_defconn: successful" that I show above, and it probably also a result of the same commit that I referenced above.
What this commit does is to make ldap_int_open_connection call ldap_int_tls_start if LDAP_OPT_X_TLS_HARD is set and ldap_connect_to_host returned -2 (EWOULDBLOCK/EINPROGRESS). There are two problems with doing that:
1. It can't start TLS before the connect has completed. If ldap_int_tls_start does not wait for that then it will fail.
2. If ldap_int_tls_start does wait for the connect to complete then it will have to wait synchronously for that, which will break the asynchronicity of the connect.
To make it properly asynchronous, what should be happening on a -2 return code is that control is returned to the caller of ldap_sasl_bind() etc. with return code LDAP_X_CONNECTING. That caller should then use poll or select to wait for the connect to complete, and then in the case of TLS it would need to make the call to start that (via ldap_int_tls_start).
https://bugs.openldap.org/show_bug.cgi?id=10023
ipuleston@sonicwall.com ipuleston@sonicwall.com changed:
What |Removed |Added ---------------------------------------------------------------------------- CC| |vsmith@interlinknetworks.co | |m
--- Comment #2 from ipuleston@sonicwall.com ipuleston@sonicwall.com --- There are two ways to fix this, and I've tested that both work:
1. Add the following in ldap_int_open_connection before the LDAP_OPT_X_TLS_HARD check that was updated for ITS #8957:
if ( async && rc == -2) { /* Need to let the connect complete asynchronously before we continue */ return -2; }
2. Change the return( 0 ) statement at the end of ldap_int_open_connection (after the LDAP_OPT_X_TLS_HARD check) to this (or maybe just return rc):
return ( (rc == -2) ? -2 : 0 );
I prefer #1 because it makes the connect properly function asynchronously, but it will affect the fix for ITS #8957 and the reporter of that may have to do some extra coding
https://bugs.openldap.org/show_bug.cgi?id=10023
--- Comment #3 from ipuleston@sonicwall.com ipuleston@sonicwall.com --- Sorry, saved the above comment before I had completed it. To continue...
I prefer option #1 because it makes the connect properly function asynchronously when using TLS, but it will affect the fix for ITS #8957 and the reporter of that may have to do some extra coding (I have added them as a cc).
Option #2 fixes my reported issue without affecting that fix, but leaving it doing a synchronous connect when LDAP_OPT_X_TLS_HARD is set along with LDAP_OPT_CONNECT_ASYNC.
I don't actually use LDAP_OPT_X_TLS_HARD in my code (I start TLS from outside of the OpenLDAP code after waiting for the connect to complete) so either one will work for me.
https://bugs.openldap.org/show_bug.cgi?id=10023
--- Comment #4 from Howard Chu hyc@openldap.org --- (In reply to ipuleston@sonicwall.com from comment #1)
To make it properly asynchronous, what should be happening on a -2 return code is that control is returned to the caller of ldap_sasl_bind() etc. with return code LDAP_X_CONNECTING. That caller should then use poll or select to wait for the connect to complete, and then in the case of TLS it would need to make the call to start that (via ldap_int_tls_start).
Note that ldap_int_* functions are for internal use only, not for users to invoke.
https://bugs.openldap.org/show_bug.cgi?id=10023
--- Comment #5 from Howard Chu hyc@openldap.org --- (In reply to ipuleston@sonicwall.com from comment #2)
There are two ways to fix this, and I've tested that both work:
- Add the following in ldap_int_open_connection before the
LDAP_OPT_X_TLS_HARD check that was updated for ITS #8957:
if ( async && rc == -2) { /* Need to let the connect complete asynchronously before we continue */ return -2; }
Please supply an actual diff with your proposed fix, thanks.
https://bugs.openldap.org/show_bug.cgi?id=10023
--- Comment #6 from ipuleston@sonicwall.com ipuleston@sonicwall.com --- (In reply to Howard Chu from comment #4)
(In reply to ipuleston@sonicwall.com from comment #1)
Note that ldap_int_* functions are for internal use only, not for users to invoke.
Yes, as things stand OpenLDAP doesn't really support asynchronous connects with TLS. Without TLS, LDAP_OPT_CONNECT_ASYNC does give asynchronous connects, but when TLS is used the "connect and start TLS" operation that it currently provides becomes synchronous.
To address this I had to patch my port to add a ldap_tls_start_async() API for starting TLS after completing an async connect. It also provides for doing the TLS handshake step-by-step asynchronously.
I would be happy to provide this as a patch, should you want to pick it up.
https://bugs.openldap.org/show_bug.cgi?id=10023
--- Comment #7 from ipuleston@sonicwall.com ipuleston@sonicwall.com --- Created attachment 952 --> https://bugs.openldap.org/attachment.cgi?id=952&action=edit Patch for option #1, makes connect fully async, but will affect the fix for ITS #8957
https://bugs.openldap.org/show_bug.cgi?id=10023
--- Comment #8 from ipuleston@sonicwall.com ipuleston@sonicwall.com --- Created attachment 953 --> https://bugs.openldap.org/attachment.cgi?id=953&action=edit Patch for option #2, doesn't affect the fix for ITS #8957, but leaves connect with TLS being synchronous
https://bugs.openldap.org/show_bug.cgi?id=10023
--- Comment #9 from ipuleston@sonicwall.com ipuleston@sonicwall.com --- (In reply to Howard Chu from comment #5)
Please supply an actual diff with your proposed fix, thanks.
I've attached patches for both fix options.
https://bugs.openldap.org/show_bug.cgi?id=10023
--- Comment #10 from ipuleston@sonicwall.com ipuleston@sonicwall.com --- Note: see also ITS # 9244 which also reports similar issues stemming from the same problem with commit ae6347bac and ldap_int_open_connection not returning -2 (thanks to Randy for making me aware of that).
https://bugs.openldap.org/show_bug.cgi?id=10023
ipuleston@sonicwall.com ipuleston@sonicwall.com changed:
What |Removed |Added ---------------------------------------------------------------------------- See Also| |https://bugs.openldap.org/s | |how_bug.cgi?id=9244
https://bugs.openldap.org/show_bug.cgi?id=10023
Quanah Gibson-Mount quanah@openldap.org changed:
What |Removed |Added ---------------------------------------------------------------------------- Target Milestone|--- |2.5.15 Keywords|needs_review | Assignee|bugs@openldap.org |hyc@openldap.org
https://bugs.openldap.org/show_bug.cgi?id=10023
Howard Chu hyc@openldap.org changed:
What |Removed |Added ---------------------------------------------------------------------------- Blocks| |9244
Referenced Issues:
https://bugs.openldap.org/show_bug.cgi?id=9244 [Issue 9244] async conns marked as connected too early
https://bugs.openldap.org/show_bug.cgi?id=10023
--- Comment #11 from Howard Chu hyc@openldap.org --- (In reply to ipuleston@sonicwall.com from comment #6)
(In reply to Howard Chu from comment #4)
(In reply to ipuleston@sonicwall.com from comment #1)
Note that ldap_int_* functions are for internal use only, not for users to invoke.
Yes, as things stand OpenLDAP doesn't really support asynchronous connects with TLS. Without TLS, LDAP_OPT_CONNECT_ASYNC does give asynchronous connects, but when TLS is used the "connect and start TLS" operation that it currently provides becomes synchronous.
To address this I had to patch my port to add a ldap_tls_start_async() API for starting TLS after completing an async connect. It also provides for doing the TLS handshake step-by-step asynchronously.
I would be happy to provide this as a patch, should you want to pick it up.
Please do. I considered moving the TLS setup from ldap_int_open_conn() to ldap_new_connection() but not sure that really makes things any easier.
https://bugs.openldap.org/show_bug.cgi?id=10023
--- Comment #12 from Howard Chu hyc@openldap.org --- To be clear: I'm going with option #1 for the fix.
Historically, OpenLDAP has always done TLS initialization synchronously. I think changing that is out of scope.
https://bugs.openldap.org/show_bug.cgi?id=10023
Howard Chu hyc@openldap.org changed:
What |Removed |Added ---------------------------------------------------------------------------- Ever confirmed|0 |1 Status|UNCONFIRMED |IN_PROGRESS
--- Comment #13 from Howard Chu hyc@openldap.org --- https://git.openldap.org/openldap/openldap/-/merge_requests/623
https://bugs.openldap.org/show_bug.cgi?id=10023
Quanah Gibson-Mount quanah@openldap.org changed:
What |Removed |Added ---------------------------------------------------------------------------- Resolution|--- |FIXED Status|IN_PROGRESS |RESOLVED
--- Comment #14 from Quanah Gibson-Mount quanah@openldap.org --- head:
• 12d2382b by Ian Puleston at 2023-05-25T16:56:00+00:00 ITS#10023 libldap: fix asynch connects
RE26:
• dcbfb330 by Ian Puleston at 2023-05-25T19:02:36+00:00 ITS#10023 libldap: fix asynch connects
RE25:
• cfb43597 by Ian Puleston at 2023-05-25T19:05:03+00:00 ITS#10023 libldap: fix asynch connects
https://bugs.openldap.org/show_bug.cgi?id=10023
Quanah Gibson-Mount quanah@openldap.org changed:
What |Removed |Added ---------------------------------------------------------------------------- Status|RESOLVED |VERIFIED