On 07/20/2018 04:41 AM, Ulrich.Windl@rz.uni-regensburg.de wrote:
Hi!
Stupid question: could it be your load-balancer that had a problem? How does the netstat look like (sockets opened, queued data, etc.?)
I do not believe it to be the load-balancer. They log loss of contact with the LDAP servers and drop them from the relay group shortly after one of these events start; and when it gets cleaned up, they're added back in. I also do not suspect network between the load balancers and the LDAP servers.
During such an event, ps -efT will usually show slapd running at full thread capacity. Comparing that to threads in cn=monitor is not possible, as those ldap searches fail.
Open sockets does not substantially change until after the event subsides. The servers will show 1200-2000 open sockets before an event, and drop lower when it clears up -- to quickly scale back up to pre-event levels.
The queues will show data being held until the socket(s) time out.
Thanks for the feedback!