On Fri, 23 Jun 2023 at 12:04, Quanah Gibson-Mount quanah@fast-mail.org wrote:
In our 2.6.4 deployment, we had a significant spike in CPU usage one day last week that lasted approximately 2 hours (8 AM UTC to 10 AM UTC). During this time, some clients started timing out when talking to the LDAP service, and search response times spiked as well, up to 9.5 seconds on searches that normally take < 3 seconds (they do have large result sets). This happened on all 6 of the read nodes that we have in our load balance pool, so whatever the issue was hit all of them at the same time. It did not happen to 2 specialized read nodes that only serve one specific service, so it was something about the traffic going to those 6 nodes. The number of ops/second during that time frame was actually lower than usual across the cluster, with a peak of 200 ops/second. We often have higher peaks than that without this type of CPU usage spiking.
I've noticed similar behavior on large accesslog purges. High CPU, poor response times, sometimes slapd even becomes unresponsive. Do these systems have an accesslog that gets purged?