CSN of delete operations - openldap-devel

30 Dec 2010


      The bdb/hdb and ldif backends assigns CSNs to delete operations that 
lacks it, which causes problems in forwarding replication 
configurations.  During the refresh phase there may be legitimate delete 
operations that should not have any CSN.  When the forwarder adds its 
CSN it might leave the forwarded and its consumers with a CSN set that 
includes a SID not present on the provider, and they will never be able 
to resync.
syncrepl_del_nonpresent() queues the minimum CSN received from the 
provider, which partly obscures this problem but in return introduce 
other :-(  The CSN set received may include updates to more than one 
CSN, and only one if these can be added on the queue.  Much worse, the 
first delete will commit the queued CSN.  If there are more than one 
entry that should be deleted this leaves an open window where the 
forwarder (and its consumers) have an apparently up-to-date CSN set 
without actually being in sync with the provider.  Running the new 
test061 with sync debugging shows traces of these problem in the logs.
In back-bdb/delete.c, the CSN of the delete operation appear to be added 
as a value in the entryCSN index, which really puzzles me.  If that 
index is to be modified I would expect that it should delete the 
entryCSN value of the entry being deleted, not to add anything.  Why 
this is only done in non-shadowed databases I cannot tell either.
I would fix these problems by assigning the CSN of delete operations in 
the frontend, i.e on the server where ordinary delete operations where 
done.  syncrepl_del_nonpresent() should not queue the CSN, updating it 
should be left to the syncrepl_updateCookie() call which takes place 
when the refresh phase completes.  But what to do about the index 
manipulation I cannot tell. Anyone?
Rein