Sync replication failure during startup.

27 Sep 2007


      OpenLDAP v. 2.3.32
Berkeley DB 4.6
gcc 4.1.0
Replication doesn't work if the master server is started after
the replica servers and a large amount of simoultaneous updates
are performed while the server is starting up.
The entries that didn't get replicated to the replicas will not
be replicated even after a restart of both master and replicas.
The contextCSN is set to a value larger than the entryCSN of the
"lost" entries.
This is what I think happens during a master server startup with
simoultaneous updates ongoing (and replicas trying to sync in the
initial phase).
Suppose that two clients (Client1 and Client2) are adding the entries
a and b respectively. If that happens between t1 and t2 (one second
between)
they will get the same entryCSN (same timestamp). If entry a is
committed
at tc1 and b at tc2, any replica search inbetween will only get the
entry a. The entry b will be lost.
Client1       entry=a, csn=x
Client2          entry=b, csn=x
Timeline ------+----------+---------+----+------>
                          |         |
               t1         |         |     t2=t1+1
                          |         |
                     tc1=entry a  tc2=entry b
                     committed    committed
Replica search query between tc1 and tc2.
I don't know if a higher granularity would prevent this, or even better,
to have some kind of a counter so that every modification gets a unique
csn.
Can you please comment on our analyzis to let us know if the analyzis is
correct or if we have missed something important?
Any help or hints on how to avoid or fix this problem is greatly
appreciated.
If I receive useful information direcly in private email, I will post a
summary.
Regards
Stelios Grigoriadis

2025

2024

2023

2022

2021

2020

2019

2018

2017

2016

2015

2014

2013

2012

2011

2010

2009

2008

2007

2006

Sync replication failure during startup.