https://bugs.openldap.org/show_bug.cgi?id=9338
Issue ID: 9338 Summary: slapd write waiter doesn't resume pending ops Product: OpenLDAP Version: 2.5 Hardware: All OS: All Status: UNCONFIRMED Severity: normal Priority: --- Component: slapd Assignee: bugs@openldap.org Reporter: hyc@openldap.org Target Milestone: ---
If a socket output buffer fills up (e.g. because the client is not reading responses fast enough) slapd will queue up any newly received operations on that connection and defer their execution till later. In the new write waiter code in master/2.5, after the socket becomes writable again the pending ops are not getting rescheduled for execution because of a missing call to connection_write(). As a result, a client waiting for these ops on that connection to finish will be hung forever.
This bug impacts the syncrepl consumer in delta-sync mode if it loses sync and has to fallback to Refresh mode, and its connection was backlogged on the provider side. In the fallback case the consumer sends an Abandon for the current search and issues a new Refresh search, but if the socket was blocked on the provider side the new search won't execute.
A fix for the write waiter is ready, and also the consumer will be patched to simply close the connection and open a new one on its fallback, to avoid running into this problem.
https://bugs.openldap.org/show_bug.cgi?id=9338
Quanah Gibson-Mount quanah@openldap.org changed:
What |Removed |Added ---------------------------------------------------------------------------- Resolution|--- |TEST Status|UNCONFIRMED |RESOLVED Target Milestone|--- |2.5.0
--- Comment #1 from Quanah Gibson-Mount quanah@openldap.org --- Commits: • 0b20b92e by Howard Chu at 2020-09-04T18:22:32+01:00 ITS#9338 syncrepl: Don't reuse existing connection on Refresh fallback
• 95c5a169 by Howard Chu at 2020-09-04T18:22:40+01:00 ITS#9338 Make sure connection gets rescheduled after write blockage clears up
• 5e8a78fa by Howard Chu at 2020-09-04T20:23:44+01:00 ITS#9338 Add backlog control for testing slapd write waits
• c714acb7 by Howard Chu at 2020-09-04T20:48:45+01:00 ITS#9338 add regression test
https://bugs.openldap.org/show_bug.cgi?id=9338
Quanah Gibson-Mount quanah@openldap.org changed:
What |Removed |Added ---------------------------------------------------------------------------- Target Milestone|2.5.0 |2.4.53 Resolution|TEST |FIXED
--- Comment #2 from Quanah Gibson-Mount quanah@openldap.org --- Relevant parts for RE24 applied:
Commits: • 30778bda by Howard Chu at 2020-09-04T20:52:26+00:00 ITS#9338 syncrepl: Don't reuse existing connection on Refresh fallback
• ceb632b0 by Howard Chu at 2020-09-04T20:52:34+00:00 ITS#9338 Add backlog control for testing slapd write waits
• f6637f27 by Howard Chu at 2020-09-04T20:52:38+00:00 ITS#9338 add regression test
https://bugs.openldap.org/show_bug.cgi?id=9338
--- Comment #3 from Quanah Gibson-Mount quanah@openldap.org --- Note on RE24: It does not have the write waiter code, so that portion was not relevant. However, it still needed the fix to use a new connection on REFRESH fallback.
https://bugs.openldap.org/show_bug.cgi?id=9338
Quanah Gibson-Mount quanah@openldap.org changed:
What |Removed |Added ---------------------------------------------------------------------------- Status|RESOLVED |VERIFIED
https://bugs.openldap.org/show_bug.cgi?id=9338
--- Comment #4 from Quanah Gibson-Mount quanah@openldap.org --- trunk only (does not apply to RE24):
Commits: • 9a3e63ba by Howard Chu at 2020-09-13T08:05:31+00:00 ITS#9338 alternate fix
Don't resume pending ops unless there are no other threads waiting to write