Recently, a fluke in our configuration distribution system caused one of our consumers (running 2.3.41) to have stale schema information. slapd at debuglevel 16384 emitted:
syncrepl_message_to_op: rid 001 mods check (objectClass: value #0 invalid per syntax)
We had *not* specified schemachecking in this syncrepl stanza, and slapd.conf(5) says:
The schema checking can be enforced at the LDAP Sync consumer site by turning on the schemachecking parameter. The default is off.
Should this error have been raised in this case? I tried explicitly disabling schemachecking ("schemachecking=off" in the syncrepl stanza), but this error was still raised.
Once the schema was updated appropriately, I started slapd (again at debuglevel 16384) and saw syncrepl operations being successfully executed:
syncrepl_message_to_op: rid 001 be_modify uid=example,[...],o=org (0)
Thinking all was well, I ^C'd slapd, and slapd shut itself down successfully. I restarted slapd using an init script, but the backend's contextCSN didn't start incrementing. Once again at debuglevel 16384:
null_callback: error code 0x10 syncrepl_message_to_op: rid 001 be_modify uid=bad-objectclass,[...],o=org (16) do_syncrepl: rid 001 retrying (29 retries left)
uid=bad-objectclass is the same entry that triggered the schemachecking error in the first place. Error 0x10 is LDAP_NO_SUCH_ATTRIBUTE, and this seems a lot like the symptoms described in this thread:
http://www.openldap.org/lists/openldap-software/200801/msg00126.html
To make a long story short, it seems that syncrepl doesn't update the backend's contextCSN until it's processed its backlog? To check, I stopped another consumer and let a backlog build, then started it at debuglevel 16384 and watched the backend's contextCSN with ldapsearch(1). contextCSN didn't increment until the backlog was completely processed, even though I could see the changes it was processing with ldapsearch(1) as soon as they were processed.
If a consumer processes replication without updating the backend's contextCSN, it will try to re-process the same replication entries when it starts up again, which will generally fail. This seems to leave one in a bind, either having to manually determine the correct value for contextCSN and update it manually, or remove the backend's data files and let syncrepl rebuild them from scratch. If this assessment is correct, this behavior doesn't seem desirable.
john
--On Thursday, May 01, 2008 11:39 AM -0400 John Morrissey jwm@horde.net wrote:
Recently, a fluke in our configuration distribution system caused one of our consumers (running 2.3.41) to have stale schema information. slapd at debuglevel 16384 emitted:
Should this error have been raised in this case? I tried explicitly disabling schemachecking ("schemachecking=off" in the syncrepl stanza), but this error was still raised.
The error is correct. schemachecking off makes it so that entries do not have to comply to *known* schema. Your schema was not known.
--Quanah
--
Quanah Gibson-Mount Principal Software Engineer Zimbra, Inc -------------------- Zimbra :: the leader in open source messaging and collaboration
On Thu, May 01, 2008 at 09:57:25AM -0700, Quanah Gibson-Mount wrote:
--On Thursday, May 01, 2008 11:39 AM -0400 John Morrissey jwm@horde.net wrote:
Recently, a fluke in our configuration distribution system caused one of our consumers (running 2.3.41) to have stale schema information. slapd at debuglevel 16384 emitted:
Should this error have been raised in this case? I tried explicitly disabling schemachecking ("schemachecking=off" in the syncrepl stanza), but this error was still raised.
The error is correct. schemachecking off makes it so that entries do not have to comply to *known* schema. Your schema was not known.
I guess I was confused by slapd.conf(5):
The schema checking can be enforced at the LDAP Sync consumer site by turning on the schemachecking parameter. The default is off. Schema checking on means that replicated entries must have a structural objectClass, must obey to objectClass requirements in terms of required/allowed attributes, and that naming attributes and distinguished values must be present. As a consequence, schema checking should be off when partial replication is used.
The reason it works this way (and how it serves the partial replication use case) makes total sense now, but I might not be the only one to draw the wrong conclusion from something like "schemachecking=off".
I'm not sure how I would reword this part of the man page, but FWIW it was what confused me about this option's behavior.
john
John Morrissey wrote:
To make a long story short, it seems that syncrepl doesn't update the backend's contextCSN until it's processed its backlog? To check, I stopped another consumer and let a backlog build, then started it at debuglevel 16384 and watched the backend's contextCSN with ldapsearch(1). contextCSN didn't increment until the backlog was completely processed, even though I could see the changes it was processing with ldapsearch(1) as soon as they were processed.
If a consumer processes replication without updating the backend's contextCSN, it will try to re-process the same replication entries when it starts up again, which will generally fail. This seems to leave one in a bind, either having to manually determine the correct value for contextCSN and update it manually, or remove the backend's data files and let syncrepl rebuild them from scratch. If this assessment is correct, this behavior doesn't seem desirable.
Your description of the behavior is correct. It's required to work that way with regular syncrepl; it probably should work differently for delta-sync but nobody pointed it out before.
In regular syncrepl the refresh phase does a regular LDAP search, whose results are returned in whatever arbitrary order the database normally retrieves things. Since that's pretty much guaranteed not to be the same as the order in which changes were made, we can't update the contextCSN until all of the changes have been received.
In delta-syncrepl the refresh is coming from the log, and unless somebody has been explicitly mucking around in the log DB, the entries will always be returned in order, so it's possible to update the contextCSN after each entry has been received. But it's up to the provider to send the cookie with each entry in this case, and the syncprov overlay doesn't really know the difference between delta-sync and regular sync, so it doesn't do it.
You could file an ITS for this, but I don't think we'll be changing this in 2.3.
--On Thursday, May 01, 2008 1:46 PM -0700 Howard Chu hyc@symas.com wrote:
In delta-syncrepl the refresh is coming from the log, and unless somebody has been explicitly mucking around in the log DB, the entries will always be returned in order, so it's possible to update the contextCSN after each entry has been received. But it's up to the provider to send the cookie with each entry in this case, and the syncprov overlay doesn't really know the difference between delta-sync and regular sync, so it doesn't do it.
You could file an ITS for this, but I don't think we'll be changing this in 2.3.
Unless the log doesn't exist yet, or we've run into one of those fun cases where the replica needs to request a re-sync (like the hotslapcat scenario recently addressed). Then delta-sync goes into the refresh mode, so I'm assuming we need to still differentiate that in syncprov someway.
--Quanah
--
Quanah Gibson-Mount Principal Software Engineer Zimbra, Inc -------------------- Zimbra :: the leader in open source messaging and collaboration
Quanah Gibson-Mount wrote:
--On Thursday, May 01, 2008 1:46 PM -0700 Howard Chuhyc@symas.com wrote:
In delta-syncrepl the refresh is coming from the log, and unless somebody has been explicitly mucking around in the log DB, the entries will always be returned in order, so it's possible to update the contextCSN after each entry has been received. But it's up to the provider to send the cookie with each entry in this case, and the syncprov overlay doesn't really know the difference between delta-sync and regular sync, so it doesn't do it.
You could file an ITS for this, but I don't think we'll be changing this in 2.3.
Unless the log doesn't exist yet, or we've run into one of those fun cases where the replica needs to request a re-sync (like the hotslapcat scenario recently addressed). Then delta-sync goes into the refresh mode, so I'm assuming we need to still differentiate that in syncprov someway.
We already do. The log DB has the syncprov-nopresent and syncprov-reloadhint flags set, the main DB won't. So we could cue off of those flags for this purpose as well.
--On Thursday, May 01, 2008 2:09 PM -0700 Howard Chu hyc@symas.com wrote:
Unless the log doesn't exist yet, or we've run into one of those fun cases where the replica needs to request a re-sync (like the hotslapcat scenario recently addressed). Then delta-sync goes into the refresh mode, so I'm assuming we need to still differentiate that in syncprov someway.
We already do. The log DB has the syncprov-nopresent and syncprov-reloadhint flags set, the main DB won't. So we could cue off of those flags for this purpose as well.
Cool.
John, will you be filing an ITS on this?
--Quanah
--
Quanah Gibson-Mount Principal Software Engineer Zimbra, Inc -------------------- Zimbra :: the leader in open source messaging and collaboration
openldap-software@openldap.org