Full_Name: Howard Chu Version: 2.4 OS: URL: ftp://ftp.openldap.org/incoming/ Submission from: (NULL) (70.87.222.79) Submitted by: hyc
We found a case where a site was using regular syncrepl and replicating changes to very large groups (hundreds of thousands of members, tens of megabytes total entry size) and the socket send queues on the provider were getting filled because the consumers weren't reading or processing fast enough. At some point the writetimeout on the provider kicked in and these syncrepl connections got closed, forcing the consumers to reconnect and start the whole process all over again. This sequence of events basically meant the consumers never made any progress, because the refresh on next connect attempt had to send such a large volume of data all over again. syslogs showed consumers having their connection closed multiple times within short (5 minute) spans of time.
The idletimeout already ignores consumer connections; writetimeout should as well. Sites can use keepalive to handle truly hung consumers, as opposed to just overloaded consumers.