https://bugs.openldap.org/show_bug.cgi?id=8125
--- Comment #19 from Ondřej Kuzník ondra@mistotebe.net --- (In reply to Ondřej Kuzník from comment #12)
This is my understanding of the above discussion:
- deltasync consumer has just switched to full refresh (but is ahead from this provider in some ways)
- provider sends the present list
- consumer deletes extra entries, builds a new cookie
- problem is that the new cookie is built to reflect the union of both the local and received cookies even though we may have undone some of the changes which we then ignore
If that's accurate, there are some approaches that could fix it:
- Simple one is to remember the actual cookie we got from the server and refuse to delete entries with entryCSN ahead of the provided CSN set. Problem is that we get even further from being able to replicate from a generic RFC4533 provider.
This has actually been done in ITS#9282.
Instead, when present phase is initiated, we might terminate all other sessions, adopt the complete CSN set and restart them only once the new CSN set has been fully established.
Also, whenever we fall back from deltasync into plain syncrepl, we should make sure that the accesslog entries we generate from this are never used for further replication which might be thought to be a separate issue. Maybe the ITS#8486 work might be useful for this if we have a way of signalling to accesslog to reset minCSN accordingly to the new CSN set.
The former is simpler, but the latter feels like the only one that actually addresses these problems in full.
I have some code to do this, terminate only persist sessions when we detect getting into a present refresh.
Need a way to reproduce this in current master since a lot of the issues would have been fixed in ITS#9282 and might only be diverging in relayed deltasync, possibly if we're refreshing from two other providers at the same time.