If I've understood the documentation correctly, syncrepl utilises a synchronisation cookie which is passed from provider to consumer every so often, and this cookie is based on something called contextCSN which I do not yet understand (but I don't think that matters right now).
Which loglevel keyword (if any) will allow me to see this synchronisation cookie being passed?
Which other loglevels are useful for debugging replication issues? (Apart from sync, of course, which I'm already using.)
Background: I'm trying to debug an issue in which, sometimes, a record that is deleted on the provider does not get deleted on some of the consumers.
Version: 2.3.31 Patch: the consumers that exihibit this behaviour are running a version that has been patched for ITS4790, but there are a handful patched consumers that are okay. Some consumers are still running the unpatched version, and all of them are okay (other than the ITS4790 bug).
Backend: bdb (4.2) at both ends
Provider config:
overlay syncprov syncprov-checkpoint 10 5 syncprov-sessionlog 100
Consumer config:
syncrepl rid=123 provider=ldap://wwsv04.opus.co.nz type=refreshAndPersist searchbase="dc=example,dc=co,dc=nz" scope=sub schemachecking=off bindmethod=simple binddn="cn=root,dc=example,dc=co,dc=nz" credentials=secret retry=5,5,30,5,60,5,300,+
Background: I'm trying to debug an issue in which, sometimes, a record that is deleted on the provider does not get deleted on some of the consumers.
Ooh, my favorite kind of syncrepl bug. And you're even using a slog! I haven't noticed any of these in a while...doesn't mean they're not there...
Which loglevel keyword (if any) will allow me to see this synchronisation cookie being passed?
None off the shelf, I don't think, except for packet trace (a bit too heavy). Try
http://cvs.rutgers.edu/cgi-bin/viewcvs.cgi/*checkout*/source-patches/openlda...
(that's Howard's code, not mine). Despite the name, it applies clean to 2.3.32.
Aaron Richton wrote:
Background: I'm trying to debug an issue in which,
sometimes, a record that
is deleted on the provider does not get deleted on some of
the consumers.
Ooh, my favorite kind of syncrepl bug. And you're even using a slog!
Session log? It seemed sensible, also I started from the example provided in the admin guide.
Which loglevel keyword (if any) will allow me to see this
synchronisation
cookie being passed?
None off the shelf, I don't think, except for packet trace (a bit too heavy). Try
http://cvs.rutgers.edu/cgi-bin/viewcvs.cgi/*checkout*/source-p atches/openldap-2.3.25-syncprovlog.patch
(that's Howard's code, not mine). Despite the name, it applies clean to 2.3.32.
Thanks, I'll give it a whirl. I'll probably be back. :-)
Aaron Richton wrote:
Background: I'm trying to debug an issue in which, sometimes, a record that is deleted on the provider does not get deleted on some of the consumers.
Ooh, my favorite kind of syncrepl bug. And you're even using a slog! I haven't noticed any of these in a while...doesn't mean they're not there...
Which loglevel keyword (if any) will allow me to see this synchronisation cookie being passed?
None off the shelf, I don't think, except for packet trace (a bit too heavy). Try
http://cvs.rutgers.edu/cgi-bin/viewcvs.cgi/*checkout*/source-patches/openlda...
(that's Howard's code, not mine). Despite the name, it applies clean to 2.3.32.
Hm I don't recall, did this extra debugging actually help pinpoint any problems? If so I can commit in the main source tree.
Hm I don't recall, did this extra debugging actually help pinpoint any problems? If so I can commit in the main source tree.
As I recall, the original point was to verify the if/else logic (which was and remains good), which turned out to be a herring. But it did point out a corruption in one of the csn values (became obvious when the Debug() started printing garbage). It is a bit of a chatty Debug(), but I've been using it in production since 2.3.25 (hence the filename) figuring that it's better to have the information for debugging than to just have a "wow, that didn't replicate right, wonder why."
Might be nice to have it committed, though. It'd save me one local patch at a minimum...
Aaron wrote:
http://cvs.rutgers.edu/cgi-bin/viewcvs.cgi/*checkout*/source-p atches/openldap-2.3.25-syncprovlog.patch
(that's Howard's code, not mine). Despite the name, it applies clean to 2.3.32.
Another couple of quick questions:
Does this patch enhance the "sync" loglevel, or do I need to enable something else? I currently have "loglevel config sync". Or do I need to put a special config switch in at build time?
Am I right in thinking that with "syncprov-checkpoint 10 5" I should see something happen at every checkpoint, in this case every 5 minutes?
I've just deployed the patched 2.3.32 to the provider and one of the consumers, have been watching the logs for at least 15 minutes and haven't seen anything that looks like a CSN, at either end.
There's very little updating going on at the moment (after-hours) so the 10 updates criterion won't have been met yet.
Does this patch enhance the "sync" loglevel, or do I need to enable something else? I currently have "loglevel config sync". Or do I need to put a special config switch in at build time?
Yes, the statements are debug_sync, for the provider.
Am I right in thinking that with "syncprov-checkpoint 10 5" I should see something happen at every checkpoint, in this case every 5 minutes?
(Dangerously talking off the top of my head without looking at the code...)
No, this happens when the slog is (or isn't) used, which happens during syncrepl client searches.
I've just deployed the patched 2.3.32 to the provider and one of the consumers, have been watching the logs for at least 15 minutes and haven't seen anything that looks like a CSN, at either end.
For those statements to hit, I believe it has to actually hit the slog, which isn't done in all cases. Let it run a bit longer...
Aaron wrote:
For those statements to hit, I believe it has to actually hit the slog, which isn't done in all cases. Let it run a bit longer...
Ok, I see it now, getting entries like this:
Jan 15 13:10:12 wwsv04 slapd[7879]: cmp -1, too old Jan 15 13:10:12 wwsv04 slapd[7879]: log csn 20070114221946Z#000001#00#000000 Jan 15 13:10:12 wwsv04 slapd[7879]: cmp -1, too old Jan 15 13:10:12 wwsv04 slapd[7879]: log csn 20070114222017Z#000000#00#000000 Jan 15 13:10:12 wwsv04 slapd[7879]: cmp -1, too old Jan 15 13:10:12 wwsv04 slapd[7879]: log csn 20070114222503Z#000000#00#000000 Jan 15 13:10:12 wwsv04 slapd[7879]: cmp -1, too old Jan 15 13:10:12 wwsv04 slapd[7879]: log csn 20070114223008Z#000000#00#000000 Jan 15 13:10:12 wwsv04 slapd[7879]: cmp 0, too old Jan 15 13:10:12 wwsv04 slapd[7879]: log csn 20070114224449Z#000000#00#000000 Jan 15 13:10:12 wwsv04 slapd[7879]: log csn 20070114225002Z#000000#00#000000 Jan 15 13:10:12 wwsv04 slapd[7879]: log csn 20070114225325Z#000000#00#000000 Jan 15 13:10:12 wwsv04 slapd[7879]: log csn 20070114230457Z#000000#00#000000 ... Jan 15 13:10:12 wwsv04 slapd[7879]: Entry uid=example,ou=People,dc=example,dc=co,dc=nz CSN 20070114223008Z#000000#00#000000 matches ctx 20070114223008Z#000000#00#000000
Now I just need to catch the critter in action.
I presume a function trace would be pretty much required if/when I get to the point of logging a bug report.
Has any conclusion ever been reached regarding ITS#4716? It may be related.
openldap-software@openldap.org