Hi
I had a master/replicas setup that has been working for years. Recently, I discovered that a replica sometime failed to return existing entries: no error, it just returned nentries=0, for a few queries, then went back to normal.
I suspected some database corruption: I killed slapd, removed the databases, restarted slapd so that it resync from master. It pulled a few hundreds of records and then got its up to date contextCSN while a lot of entries are still missing. Restarting slapd exhibit a stright failure in syncrepl: Sep 28 06:34:29 motul slapd[2901]: do_syncrepl: rid=217 rc -2 retrying
How do I debug that? I had the problem with OpenLDAP-2.4.21/ db-4.7.25.3. I tried upgrading the replica to OpenLDAP-2.4.32 / db-4.8.30 but it did not change anything. Is there a chance that upgrading the master (runs 2.4.21 too) will help?
During the short time syncrepl runs, logs are filled with stuff like this, I wonder if it is a related problem or not: Sep 28 03:30:16 motul slapd[269]: conn=-1 op=0 => bdb_dn2id_add dn="uid=user,ou=foo,dc=example,dc=net" ID=0x7c: put failed: DB_LOCK_DEADLOCK: Locker killed to r esolve a deadlock -30994