Full_Name: Aaron Richton Version: HEAD/RE23 OS: CentOS 4.4 URL: ftp://ftp.openldap.org/incoming/richton-20061011.patch Submission from: (NULL) (128.6.31.135)
In the event of a loss of connection with the syncrepl server, slapd(8) in its role as syncrepl client is expected to retry consistent with any setting in a "retry" configuration clause. However, a refreshAndPersist client in connected state will merely wait (forever) for the (nonexistent lost) connection to provide data in the event of a network failure. There is no current application layer nor network layer awareness of the connection failure apart from "it's been quiet for a long time," which doesn't make a good algorithm. From a *ix network stack standpoint, connections remain ESTABLISHED even in the face of network failure, and slapd(8) doesn't have a clue that it should be retrying.
The linked patch turns on SO_KEEPALIVE if available, creating network layer awareness of the connection failure. When combined with appropriate IP stack tuning (out of the scope of OpenLDAP), very quick retry times can be accomplished. I have found any retries impossible without this patch.
To replicate, install OpenLDAP with refreshAndPersist, and do something brutal to the consumer network -- firewall off communication with your master server, pull the network cable, etc. Wait for the connection to die off on the master server (slapd/daemon.c already makes SO_KEEPALIVE on the server side), then restore proper network state to the consumer. netstat on the syncrepl client will show ESTABLISHED; it's ignorant of your network destruction and will never retry because it still believes everything is happy.