hyc@symas.com wrote:
Hm, in re-reading ITS#4626, I see a pertinent detail in followup #2. I think I understand part of the problem.
The particular entry was modified after the current refresh session began, so that entry is omitted from the current refresh results. Since the entry is actually missing from the refresh data, the consumer treats it as deleted. Since the entry has children, it cannot actually be deleted, so it gets turned into a glue entry.
So there's two issues - the provider should still send the UUID of the entry, so that the consumer doesn't consider it deleted. But also, this problem ought to have self-corrected. Once the replication transitioned from Refresh to Persist phase, the modified entry should have been sent to the consumer, and the glue entry should have been replaced by the correct data.
Looks like both problems are in the syncprov overlay.
syncprov.c is now patched in HEAD to always send the UUIDs of the present entries. That appears to prevent the problem from arising. It's a bit difficult to automate a test for this because it requires such exact timing; a modification must occur on the provider while the consumer is just beginning its refresh search. Currently I can only test this by inserting a Debug message and a sleep() in the provider to indicate that the consumer is connected, and to give time to issue a modification.