https://bugs.openldap.org/show_bug.cgi?id=9244
Bug ID: 9244 Summary: API calls blocking after async connect Product: OpenLDAP Version: 2.4.49 Hardware: All OS: All Status: UNCONFIRMED Severity: normal Priority: --- Component: libraries Assignee: bugs@openldap.org Reporter: ryan@openldap.org Target Milestone: ---
Created attachment 721 --> https://bugs.openldap.org/attachment.cgi?id=721&action=edit async connect test without TLS
My understanding of LDAP_OPT_CONNECT_ASYNC is that the attached program should not block. If the connection does not establish fast enough, the bind call is supposed to return LDAP_X_CONNECTING.
(At least that's how I understand it, based on the original behaviour (circa 2.4.23 up to 2.4.40) as well as the bind loop in back-meta. On the other hand, the man page does "Subsequent calls to library routines will poll for completion of the connect before performing further operations" which might be interpreted as meaning they would block...)
In current releases it does block, as demonstrated by strace on Linux (latency added using 'tc qdisc'):
[...] connect(3, {sa_family=AF_INET, sin_port=htons(389), sin_addr=inet_addr("192.168.1.204")}, 16) = -1 EINPROGRESS (Operation now in progress) write(3, "0\f\2\1\1`\7\2\1\3\4\0\200\0", 14) = -1 EAGAIN (Resource temporarily unavailable) poll([{fd=3, events=POLLOUT|POLLERR|POLLHUP}], 1, -1) = 1 ([{fd=3, revents=POLLOUT}]) write(3, "0\f\2\1\1`\7\2\1\3\4\0\200\0", 14) = 14 poll([{fd=3, events=POLLIN|POLLPRI}], 1, -1) = 1 ([{fd=3, revents=POLLIN}]) read(3, "0\f\2\1\1a\7\n", 8) = 8 read(3, "\1\0\4\0\4\0", 6) = 6 write(2, "OK: ldap_simple_bind_returned 0 "..., 42OK: ldap_simple_bind_returned 0 (Success) ) = 42 [...]
As discussed in IRC, I believe I bisected this down to commit ae6347bac, from bug 8022. The reasoning is sound, but ldap_int_open_connection does not actually return -2, only -1 or 0.
The patch is simple enough, but I'm also looking at some later commits that were probably done to work around this, and might not be needed now (bug 8957, bug 8968, bug 8980). Also need to test all setups thoroughly (ldap, ldaps, STARTTLS, not to mention back-meta/asyncmeta).
I also notice that LDAP_OPT_CONNECT_ASYNC is not effective unless LDAP_OPT_NETWORK_TIMEOUT is also set. It might be intentional, but the man page doesn't mention this specifically, and I don't see why it would be necessary...
https://bugs.openldap.org/show_bug.cgi?id=9244
Ryan Tandy ryan@openldap.org changed:
What |Removed |Added ---------------------------------------------------------------------------- Assignee|bugs@openldap.org |ryan@openldap.org Ever confirmed|0 |1 Status|UNCONFIRMED |IN_PROGRESS
https://bugs.openldap.org/show_bug.cgi?id=9244
--- Comment #1 from Howard Chu hyc@openldap.org --- (In reply to Ryan Tandy from comment #0)
Created attachment 721 [details] async connect test without TLS
My understanding of LDAP_OPT_CONNECT_ASYNC is that the attached program should not block. If the connection does not establish fast enough, the bind call is supposed to return LDAP_X_CONNECTING.
The Bind should not block for longer than the connect timeout.
I also notice that LDAP_OPT_CONNECT_ASYNC is not effective unless LDAP_OPT_NETWORK_TIMEOUT is also set. It might be intentional, but the man page doesn't mention this specifically, and I don't see why it would be necessary...
That is the design. If you don't specify a connect timeout, then it doesn't know how long to wait for in that polling loop.
https://bugs.openldap.org/show_bug.cgi?id=9244
--- Comment #2 from Ryan Tandy ryan@openldap.org --- (In reply to Howard Chu from comment #1)
The Bind should not block for longer than the connect timeout.
Just to confirm: I'm talking about an initial request with implicit connect. Not an explicit ldap_connect() or ldap_open_defconn() followed by a real request. Are you saying the implicit connect inside ldap_simple_bind_s() should block for up to the network timeout?
I don't think it has ever worked that way. The calls to ldap_int_poll() in ldap_send_initial_request() (inside ldap_int_check_async_open()) and ldap_send_server_request() both use a timeout of 0.
In ldap_send_server_request(), if the network timeout is exceeded, then it falls through and returns LDAP_SERVER_DOWN. If it's meant to block during the connect timeout, when should I actually expect to see LDAP_X_CONNECTING?
In any case, right now it never reaches that code at all, so the network timeout isn't even respected with async. (With a timeout, but not async, it works fine: blocks for the network timeout and then returns LDAP_SERVER_DOWN.)
https://bugs.openldap.org/show_bug.cgi?id=9244
Ryan Tandy ryan@openldap.org changed:
What |Removed |Added ---------------------------------------------------------------------------- Summary|API calls blocking after |async conns marked as |async connect |connecting too early
https://bugs.openldap.org/show_bug.cgi?id=9244
Ryan Tandy ryan@openldap.org changed:
What |Removed |Added ---------------------------------------------------------------------------- Summary|async conns marked as |async conns marked as |connecting too early |connected too early
https://bugs.openldap.org/show_bug.cgi?id=9244
--- Comment #3 from Ryan Tandy ryan@openldap.org --- A few notes from investigation so far.
50b33cc6 is wrong and masked some issues. ENOTCONN is telling us that we're trying to send on the socket before it finished connecting - meaning we actually have a bug somewhere earlier. Solaris 10 is actually a better platform for testing this, because Linux just returns EAGAIN in this case. (Also noticed by others, e.g. https://lists.freebsd.org/pipermail/freebsd-bugs/2008-November/033159.html.)
Other patches by the same author are similar workarounds to issues that shouldn't happen - need to identify and fix the root cause. The changes to avoid sockets being set to blocking should all be unnecessary, because none of them should be reached until after the connection is established. (For example, I think ldap_int_start_tls shouldn't need to consider the async setting at all, only the timeout.)
ldap_int_check_async_open doesn't check the network timeout. The code that _does_ check it can't be reached before the connection is established, so as far as I can tell, the connect timeout is currently just ignored when async.
ldap_send_initial_request calls ldap_int_check_async_open(ld_defconn). But when chasing a referral, that's a different connection! Again, Linux masks this bug by returning EAGAIN, but on Solaris 10 I can see that the referral connection sends its Bind too early and gets ENOTCONN. That one should be an easy fix.
https://bugs.openldap.org/show_bug.cgi?id=9244
Quanah Gibson-Mount quanah@openldap.org changed:
What |Removed |Added ---------------------------------------------------------------------------- Target Milestone|--- |2.5.1 Keywords| |OL_2_5_REQ
https://bugs.openldap.org/show_bug.cgi?id=9244
Quanah Gibson-Mount quanah@openldap.org changed:
What |Removed |Added ---------------------------------------------------------------------------- Target Milestone|2.5.1 |2.6.0 Keywords|OL_2_5_REQ |reviewed
https://bugs.openldap.org/show_bug.cgi?id=9244
Quanah Gibson-Mount quanah@openldap.org changed:
What |Removed |Added ---------------------------------------------------------------------------- Assignee|ryan@openldap.org |bugs@openldap.org
https://bugs.openldap.org/show_bug.cgi?id=9244
Quanah Gibson-Mount quanah@openldap.org changed:
What |Removed |Added ---------------------------------------------------------------------------- Status|IN_PROGRESS |CONFIRMED Target Milestone|2.6.0 |2.7.0 Keywords|reviewed |
https://bugs.openldap.org/show_bug.cgi?id=9244
Howard Chu hyc@openldap.org changed:
What |Removed |Added ---------------------------------------------------------------------------- Depends on| |10023
Referenced Issues:
https://bugs.openldap.org/show_bug.cgi?id=10023 [Issue 10023] Asynchronous connects are broken
https://bugs.openldap.org/show_bug.cgi?id=9244 Issue 9244 depends on issue 10023, which changed state.
Issue 10023 Summary: Asynchronous connects are broken https://bugs.openldap.org/show_bug.cgi?id=10023
What |Removed |Added ---------------------------------------------------------------------------- Status|IN_PROGRESS |RESOLVED Resolution|--- |FIXED
https://bugs.openldap.org/show_bug.cgi?id=9244
Ondřej Kuzník ondra@mistotebe.net changed:
What |Removed |Added ---------------------------------------------------------------------------- See Also| |https://bugs.openldap.org/s | |how_bug.cgi?id=8957, | |https://bugs.openldap.org/s | |how_bug.cgi?id=8968, | |https://bugs.openldap.org/s | |how_bug.cgi?id=8980