hyc@symas.com wrote:
Allan E. Johannesen wrote:
> "hyc" == Howard Chu hyc@symas.com writes:
hyc> Since you mention that this occurs more often in 2.3.33 than in "previous hyc> releases" - what previous version are you comparing to?
Well, I should have said I've never seen it before. I've generally been running the new releases within a day of release, and I rebuild the data at each release, so everything starts clean. Therefore, it _may_ have existed previously, but never showed up in the days during which the given releases ran.
I guess I only mentioned that since someone saw it several releases ago in a different ITS. I never saw it before.
In 2.3.33, it happened right after loading the data. I thought I did it wrong, so I loaded things again and it was fine. After some days, though, it (meaning the change to "objectClass: glue") happened again.
The only change to syncrepl between 2.3.32 and .33 was one or two debug messages, no functional changes. In 2.3.32 there was no change to syncrepl at all (the bug in ITS#4790 was in connection.c, not syncrepl.c). The only change in 2.3.31 was also in debug messages, not functional changes. So as unlikely as it seems, at the moment this appears to be a coincidence and the bug must be older.
If you see this happening repeatedly, turn on the sync debug level and capture that output for a while. When you notice the problem, you should also see some number of "syncrepl_del_nonpresent" messages in the log. We'll probably need to see a large chunk of the log to be able to follow the sequence of events.
Hm, in re-reading ITS#4626, I see a pertinent detail in followup #2. I think I understand part of the problem.
The particular entry was modified after the current refresh session began, so that entry is omitted from the current refresh results. Since the entry is actually missing from the refresh data, the consumer treats it as deleted. Since the entry has children, it cannot actually be deleted, so it gets turned into a glue entry.
So there's two issues - the provider should still send the UUID of the entry, so that the consumer doesn't consider it deleted. But also, this problem ought to have self-corrected. Once the replication transitioned from Refresh to Persist phase, the modified entry should have been sent to the consumer, and the glue entry should have been replaced by the correct data.
Looks like both problems are in the syncprov overlay.