contextCSN interaction between syncrepl and syncprov - openldap-devel

9 Mar 2009


      The remaining errors and race condition that test058 demonstrates cannot 
be solved unless syncrepl is changed to always store the contextCSN in 
the suffix of the database where it is configured, not the suffix of its 
glue database as it does today.
Assuming serverID 0 is reserved for the single master case, syncrepl and 
syncprov can in that case only be configured within the same database 
context if syncprov is a pure forwarding server  I.e, it will not update 
any CSN value and syncrepl have no need to fetch any values from it.
In the multi-master case it is only the contextCSN whose SID matches the 
current serverID that syncprov maintains, the other are all received by 
syncrepl.  So, the only time syncrepl should need an updated CSN from 
syncprov is when it is about to present it to its peer, i.e when it 
initiates a refresh phase.  Actually, a race condition that would render 
the state of the database undetermined could occur if syncrepl fetches 
an updated CSN from syncprov during the initial refresh phase.  So, it 
should be sufficient to read the contextCSN values from the database 
before a new refresh phase is initiated, independent of whether syncprov 
is in use or not.
Syncrepl will receive updates to the contextCSN value with its own SID 
from its peers, at least with ITS#5972 and ITS#5973 in place.  I.e, the 
normal ignoring of updates tagged with a too old contextCSN value will 
continue to work.  It should also be safe to ignore all updates tagged 
with a contextCSN or entryCSN value whose SID is the current servers 
non-zero serverID, provided a complete refresh cycle is known to have 
taken place.  I.e, when a contextCSN value with the current non-zero 
serverID was read from the database before the refresh phase started, or 
after the persistent phase have been entered.
The state of the database will be undetermined unless an initial refresh 
(i.e starting from an empty database or CSN set) have been run to 
completion.  I cannot see how this can be avoided, and as far as I know 
it is so now too.  It might be worth mentioning in the doc. though 
(unless it already is).
Syncprov must continue to monitor the contextCSN updates from syncrepl. 
When it receives updates destined for the suffix of the database it 
itself is configured it must replace any CSN value whose SID matches its 
own non-zero serverID with the value it manages itself (which should be 
greater or equal to the value syncrepl tried to store unless something 
is seriously wrong).  Updates to "foreign" contextCSN values (i.e those 
with a SID not matching the current non-zero serverID) should be 
imported into the set of contextCSN values syncprov itself maintain. 
Syncprov could also short-circuit the contextCSN update and delay it to 
its own checkpoint.  I'm not sure what effect the checkpoint feature 
have today when syncrepl constantly updates the contextCSN..
Syncprov must, when syncrepl updates the contextCSN in the suffix of a 
subordinate DB, update its own knowledge of the "foreign" CSNs to be the 
*lowest* CSN with any given SID stored in all the subordinate DBs (where 
syncrepl is configured).  And no update must take place unless a 
contextCSN value have been stored in *all* the syncrepl-enabled 
subordinate DBs.  Any  values matching the current non-zero serverID 
should be updated in this case too, but a new value should probably not 
be inserted.
These changes should (unless I'm completely lost that is..) create a 
cleaner interface between syncrepl and syncprov without harming the 
current multi-master configurations, and make asymmetric multiple 
masters configurations like the one in test058 work.  Comments please?
Rein