I've run into another problem with the memberOf implementation on my 2.5 servers. After I sorted out the proper configuration, queries requesting memberOf were very performant:
Feb 4 13:26:44 ldap-01 slapd[1207]: conn=23393 op=1 SRCH base="ou=user,dc=cpp,dc=edu" scope=2 deref=3 filter="(&(objectClass=person)(calstateEduPersonEmplID=013522522))" Feb 4 13:26:44 ldap-01 slapd[1207]: conn=23393 op=1 SRCH attr=memberOf Feb 4 13:26:44 ldap-01 slapd[1207]: conn=23393 op=1 SEARCH RESULT tag=101 err=0 qtime=0.000015 etime=0.191860 nentries=1 text=
However, intermittently the server gets into a state where the exact same query takes over 30 seconds:
Feb 4 08:05:11 ldap-01 slapd[1425456]: conn=40797 op=1 SRCH base="ou=user,dc=cpp,dc=edu" scope=2 deref=3 filter="(&(objectClass=person)(calstateEduPersonEmplID=015559557))" Feb 4 08:05:11 ldap-01 slapd[1425456]: conn=40797 op=1 SRCH attr=memberOf Feb 4 08:05:50 ldap-01 slapd[1425456]: conn=40797 op=1 SEARCH RESULT tag=101 err=0 qtime=0.000019 etime=39.435523 nentries=1 text=
When this occurs, the only way to resolve the issue that I have found is to reboot the server. Simply restarting slapd results in the same degraded performance on these queries.
Normally there is very low read I/O load on the servers during operation, probably averaging less than 1M/s, peaking up to maybe 20-30M/s for just an instant occasionally. When the memberOf query performance is degraded, there is a very high read I/O load on the server, continuously about 200-300M/s.
Any thoughts on this? It seems like for some reason the server gets into a state where it is not using the cache or memory map for doing the search required to construct the memberOf results? But instead is doing a full disk read of the entire database?
It's also weird that restarting the service is not resolve this, but rebooting the server does. I'm not intimately familiar with the internals of lmdb, is there some state that persists with the environment or memory map in between service runs that is only cleared by a reboot?
I initially thought I might have had a theory on it, relating to an unrelated bug in RHEL 8.5 that broke the "needs-rebooting" command resulting in servers not properly rebooting after kernel/library updates. The most recent occurrence of this issue started up after such an update without the required reboot, but upon reviewing historic occurrences it has occurred at times that don't meet that criteria, so I find myself clueless again as to what's going on.
Any advice on how to fix or do further debugging on this issue much appreciated, thanks…
openldap-technical@openldap.org