delta-syncrepl contextCSN update timing, schema checking - openldap-software

1 May 2008


      Recently, a fluke in our configuration distribution system caused one of our
consumers (running 2.3.41) to have stale schema information. slapd at
debuglevel 16384 emitted:
syncrepl_message_to_op: rid 001 mods check (objectClass: value #0 invalid
  per syntax)
We had *not* specified schemachecking in this syncrepl stanza, and
slapd.conf(5) says:
The schema checking can be enforced at the LDAP Sync consumer site by
  turning on the schemachecking parameter. The default is off.
Should this error have been raised in this case? I tried explicitly
disabling schemachecking ("schemachecking=off" in the syncrepl stanza), but
this error was still raised.
Once the schema was updated appropriately, I started slapd (again at
debuglevel 16384) and saw syncrepl operations being successfully executed:
syncrepl_message_to_op: rid 001 be_modify uid=example,[...],o=org (0)
Thinking all was well, I ^C'd slapd, and slapd shut itself down
successfully. I restarted slapd using an init script, but the backend's
contextCSN didn't start incrementing. Once again at debuglevel 16384:
null_callback: error code 0x10
syncrepl_message_to_op: rid 001 be_modify uid=bad-objectclass,[...],o=org (16)
do_syncrepl: rid 001 retrying (29 retries left)
uid=bad-objectclass is the same entry that triggered the schemachecking
error in the first place. Error 0x10 is LDAP_NO_SUCH_ATTRIBUTE, and this
seems a lot like the symptoms described in this thread:
http://www.openldap.org/lists/openldap-software/200801/msg00126.html
To make a long story short, it seems that syncrepl doesn't update the
backend's contextCSN until it's processed its backlog? To check, I stopped
another consumer and let a backlog build, then started it at debuglevel
16384 and watched the backend's contextCSN with ldapsearch(1). contextCSN
didn't increment until the backlog was completely processed, even though I
could see the changes it was processing with ldapsearch(1) as soon as they
were processed.
If a consumer processes replication without updating the backend's
contextCSN, it will try to re-process the same replication entries when it
starts up again, which will generally fail. This seems to leave one in a
bind, either having to manually determine the correct value for contextCSN
and update it manually, or remove the backend's data files and let syncrepl
rebuild them from scratch. If this assessment is correct, this behavior
doesn't seem desirable.
john
-- 
John Morrissey          _o            /\         ----  __o
jwm@horde.net        _-< _          /  \       ----  <  ,
www.horde.net/    __(_)/_(_)________/    _______(_) /_(_)__