I've run into another problem with the memberOf implementation on my 2.5
servers. After I sorted out the proper configuration, queries requesting
memberOf were very performant:
Feb 4 13:26:44 ldap-01 slapd[1207]: conn=23393 op=1 SRCH
base="ou=user,dc=cpp,dc=edu" scope=2 deref=3
filter="(&(objectClass=person)(calstateEduPersonEmplID=013522522))"
Feb 4 13:26:44 ldap-01 slapd[1207]: conn=23393 op=1 SRCH attr=memberOf
Feb 4 13:26:44 ldap-01 slapd[1207]: conn=23393 op=1 SEARCH RESULT
tag=101 err=0 qtime=0.000015 etime=0.191860 nentries=1 text=
However, intermittently the server gets into a state where the exact
same query takes over 30 seconds:
Feb 4 08:05:11 ldap-01 slapd[1425456]: conn=40797 op=1 SRCH
base="ou=user,dc=cpp,dc=edu" scope=2 deref=3
filter="(&(objectClass=person)(calstateEduPersonEmplID=015559557))"
Feb 4 08:05:11 ldap-01 slapd[1425456]: conn=40797 op=1 SRCH attr=memberOf
Feb 4 08:05:50 ldap-01 slapd[1425456]: conn=40797 op=1 SEARCH RESULT
tag=101 err=0 qtime=0.000019 etime=39.435523 nentries=1 text=
When this occurs, the only way to resolve the issue that I have found is
to reboot the server. Simply restarting slapd results in the same
degraded performance on these queries.
Normally there is very low read I/O load on the servers during
operation, probably averaging less than 1M/s, peaking up to maybe
20-30M/s for just an instant occasionally. When the memberOf query
performance is degraded, there is a very high read I/O load on the
server, continuously about 200-300M/s.
Any thoughts on this? It seems like for some reason the server gets into
a state where it is not using the cache or memory map for doing the
search required to construct the memberOf results? But instead is doing
a full disk read of the entire database?
It's also weird that restarting the service is not resolve this, but
rebooting the server does. I'm not intimately familiar with the
internals of lmdb, is there some state that persists with the
environment or memory map in between service runs that is only cleared
by a reboot?
I initially thought I might have had a theory on it, relating to an
unrelated bug in RHEL 8.5 that broke the "needs-rebooting" command
resulting in servers not properly rebooting after kernel/library
updates. The most recent occurrence of this issue started up after such
an update without the required reboot, but upon reviewing historic
occurrences it has occurred at times that don't meet that criteria, so I
find myself clueless again as to what's going on.
Any advice on how to fix or do further debugging on this issue much
appreciated, thanks…