Summary: an openldap 2.4.4 (CentOS7 stock RPM) replication consumer slapd server stops responding to requests for a period of up to fifteen minutes.
Environment:
Two centos7 ldap servers in mirror mode, providers to 4 openldap syncrepl consumers. The systems are 2 CPU, 12 core Intel Xeon E5-2420s, and have 48GB of RAM.
The four consumers are load-balanced through a FreeBSD "relayd" redirector, facing approximately six thousand clients.
Problem:
Periodically, one or more (or all) of the consumers will stop responding, including localhost cn=monitoring traffic and anything over the network. Note, only slapd stops responding. email out, logging in, etc, all remain unaffected. Analysis after the event starts doesn't show anything unusual in CPU usage or memory. Analysis of the ldap logs doesn't show anything unusual in number of requests, number of connects, etc until the system stops responding -- at which point, they drop to zero.
I'm stumped as to a) what's causing it, and b) how to address it on the slapd side so my servers stop dozing off.
Any suggestions?
-- John Jasen (jjasen@gmail.com)