syncrepl failure - openldap-technical

27 Sep 2012


      Hi
I had a master/replicas setup that has been working for years. Recently,
I discovered that a replica sometime failed to return existing entries:
no error, it just returned nentries=0, for a few queries, then went back
to normal.
I suspected some database corruption: I killed slapd, removed the
databases, restarted slapd so that it resync from master. It pulled a
few hundreds of records and then got its up to date contextCSN while a
lot of entries are still missing. Restarting slapd exhibit a stright
failure in syncrepl:
Sep 28 06:34:29 motul slapd[2901]: do_syncrepl: rid=217 rc -2 retrying
How do I debug that? I had the problem with OpenLDAP-2.4.21/
db-4.7.25.3. I tried upgrading the replica to OpenLDAP-2.4.32 /
db-4.8.30 but it did not change anything. Is there a chance that
upgrading the master (runs 2.4.21 too) will help?
During the short time syncrepl runs, logs are filled with stuff like
this, I wonder if it is a related problem or not:
Sep 28 03:30:16 motul slapd[269]: conn=-1 op=0 => bdb_dn2id_add
dn="uid=user,ou=foo,dc=example,dc=net" ID=0x7c: put failed:
DB_LOCK_DEADLOCK: Locker killed to r
esolve a deadlock -30994
-- 
Emmanuel Dreyfus
http://hcpnet.free.fr/pubz
manu@netbsd.org