Full_Name: Emily Backes Version: 2.4.26 OS: any URL: Submission from: (NULL) (76.88.107.46)
Similar to the recent overlay fixes to prevent updating entryCSN/contextCSN on local changes, delete operations can cause inappropriate CSN setting on remote servers.
Given a multi-master setup (normal syncrepl tested), so that each server has a serverID set, with no overlays loaded other than syncprov, set up two or more threads of delete operations; three or more seems to most reliably reproduce the problem on the systems I've tested.
As the deletes are happening, the server1 side should of course show it's entryCSN updating:
dn: dc=example,dc=com contextCSN: 20110923044343.412634Z#000000#001#000000
This should of course be mirrored on the server2 side with contextCSN exactly matching the set of CSN's from the server1 side. Instead, after enough concurrent deletes to hit the race:
dn: dc=example,dc=com contextCSN: 20110923044343.412634Z#000000#001#000000 contextCSN: 20110923044349.314803Z#000000#002#000000
This happens even though server2 has never received any local write operations (or indeed any connection other than the syncrepl search from server1 and my searches to retrieve contextCSN). Again, no overlays are loaded.
This breaks syncrepl's assumptions and can result in other replication problems as a result of CSN desync.
Working on tracing out exactly where it goes awry...