openldap-2.3.41 db-4.2.52.NC-PLUS_5_PATCHES Solaris 10 x86
Layout:
ldapmaster <-syncrepl-> ldapslave01/02/03/04 <-syncrepl-> data-clusters.
It has come to light that we have some sync inconsistencies. At the moment, a customer domain that shows correct entries on ldapmaster, ldapslave02, ldapslave04 (and all servers syncing from them).
But has incorrect, or rather missing, entries on ldapslave01 and ldapslave03. There are no differences between these hosts (they are in fact HDD clones) and config files are pushed from git, with only RID changed.
The logs on ldapslave03 for one of the broken entries (in this case, ou=DNS). I have loglevel=sync on all servers:
slaplog.20100407.gz:Mar 3 12:27:12 ldapslave03.unix slapd[27355]: [ID 561622 local4.debug] syncrepl_del_nonpresent: rid 329 be_delete DNSHostName=@,DNSZoneName=example.com,ou=dns,$DC (66)
slaplog.20100407.gz:Mar 3 12:27:12 ldapslave03.unix slapd[27355]: [ID 561622 local4.debug] syncrepl_del_nonpresent: rid 329 be_delete DNSZoneName=example.com,ou=dns,$DC (66)
What is "syncrepl_del_nonpresent"? Is it something I should be worried about? If I count the number of entries with said error:
ldapslave02: 42 ldapslave03: 7240
Which makes me wonder if it is a global problem for us, but is exaggerated on some servers.
I notice that the provisioning log for that customer's domain has about 24 "+dns" and "-dns" entries in a row. Not entirely sure why the customer was changing their DNS back and forth so much, but perhaps it is related.
Can it be that a "delete/create/delete" sequence of the same DN, sent to master, but which has not yet been pushed out to all slaves, may trigger this situation? Surely all replication is in strict time sequence though.
Is there anything I can do presently?
Any advise is most appreciated.
Lund