(ITS#7049) DEL/LDAP_SYNC_DELETE race touching entryCSN

22 Sep 2011


      Full_Name: Emily Backes
Version: 2.4.26
OS: any
URL: 
Submission from: (NULL) (76.88.107.46)
Similar to the recent overlay fixes to prevent updating entryCSN/contextCSN on
local changes, delete operations can cause inappropriate CSN setting on remote
servers.
Given a multi-master setup (normal syncrepl tested), so that each server has a
serverID set, with no overlays loaded other than syncprov, set up two or more
threads of delete operations; three or more seems to most reliably reproduce the
problem on the systems I've tested.
As the deletes are happening, the server1 side should of course show it's
entryCSN updating:
dn: dc=example,dc=com
contextCSN: 20110923044343.412634Z#000000#001#000000
This should of course be mirrored on the server2 side with contextCSN exactly
matching the set of CSN's from the server1 side.  Instead, after enough
concurrent deletes to hit the race:
dn: dc=example,dc=com
contextCSN: 20110923044343.412634Z#000000#001#000000
contextCSN: 20110923044349.314803Z#000000#002#000000
This happens even though server2 has never received any local write operations
(or indeed any connection other than the syncrepl search from server1 and my
searches to retrieve contextCSN).  Again, no overlays are loaded.
This breaks syncrepl's assumptions and can result in other replication problems
as a result of CSN desync.
Working on tracing out exactly where it goes awry...

2025

2024

2023

2022

2021

2020

2019

2018

2017

2016

2015

2014

2013

2012

2011

2010

2009

2008

2007

2006

(ITS#7049) DEL/LDAP_SYNC_DELETE race touching entryCSN