Re: CSN Too Old potential Bug

15 Nov 2008


      Burton, Kris - Acision wrote:
...
All,
I want to ask the list about this before I try to open an ITS to make
sure that I am understanding everything correctly. We are running
OpenLDAP 2.4.11. I selectively tried to back post ITS 5709 to our
source, because we were losing replications. Applying this seemed to
help and reduced the number of lost replications. We are running in
mirror mode using refreshAndPersist, and doing a high volume of adds to
the master, on the order of 100/s. We have run numerous iterations of
the same test with very aggressive NTP updates that are keeping both the
master and consumer within 50 microseconds of one another. Which I saw
recommended as a possible solution in a previous message thread. This
seemed to make little to no difference in the replication loss.
If you're actually using MirrorMode, with all writes going to only one server, 
then NTP doesn't really matter. The time synchronization is only important 
when reconciling concurrent updates that occurred on different servers. I.e., 
it's only important when you're running multimaster (as opposed to 
mirrormode), and for reconciling any updates that occurred while a MirrorMode 
failover was happening. From the sounds of it, your test doesn't trigger these 
criteria.
...
From looking at the code I was thinking that the lost replications
might be due to entries being queued on the master side in non-ascending
order which I was seeing preceding the replication that would be
rejected on the consumer side. What I thought was happening is that the
logic that traverses the queue to mark committed CSNs and updates the
contextCSN was getting out of sync because of this, and orphaning
replications that were still pending, because they are too old, but in
reality they have never been added to the consumer.
Looking at your debug info, this sounds likely. Yes, please submit this info 
to the ITS.
...
I just pulled the latest code from RE24 and reran the test, the latest
code is better than before with just the back post of 5709 on 2.4.11,
but we are still losing a small percentage of the replications with the
“CSN too old” message. With the latest code I am still seeing a
correlation between the out of sync queuing on the master and the
replications that are rejected on the consumer.
-- 
   -- Howard Chu
   CTO, Symas Corp.           http://www.symas.com
   Director, Highland Sun     http://highlandsun.com/hyc/
   Chief Architect, OpenLDAP  http://www.openldap.org/project/

2025

2024

2023

2022

2021

2020

2019

2018

2017

2016

2015

2014

2013

2012

2011

2010

2009

2008

2007

Re: CSN Too Old potential Bug