Hi list,
I'm currently monitoring the cache efficiency in a BerkeleyDB backend, as examined with db_stat -m. About 24 hours after starting slapd I noticed a significant drop in several indexes, while the general activity did not change much at that time. Also, the systems using the service did not experience any reduction in availability or response. The cache efficiency in percentage remains at 99%.
Its MirrorMode peer experienced a similar drop at the exact same time, though for fewer indexes.
The affected nodes are in a refreshAndPersist, retry="60 +" MirrorMode setup. Backend database is configured as hdb. The binaries in use are the latest RPMs from Buchan Milne (openldap2.4-servers-2.4.17-3.rhel5, which comes packed with its own BerkeleyDB 4.7 libraries), running on a RHEL5.3 system.
The graphs are available at http://www.ruberg.no/tmp/slapd.html. The drops occured just before 15:00.
The nodes are in the same IP network without any routers or firewalls in-between. Replication between the nodes works flawlessly.
Is this kind of drop normal behaviour? Has slapd stopped asking its backend (or peer) for data and started serving everything from its own buffer? Since the observations correlate between the active and the passive node I currently suspect this involves the replication mechanism(s).
(The reason some indexes are not currently graphed is that the numbers read from db_stat are suffixed with SI units when above 10 million, and the monitoring tool doesn't account for that yet.)
Thanks for any pointers and ideas,
openldap-technical@openldap.org