Re: difference in multi-master replication since 2.4.57

3 Mar 2021


      On 3/3/21 8:58 PM, Quanah Gibson-Mount wrote:
...
--On Wednesday, March 3, 2021 6:24 PM +0100 Emmanuel Seyman
emmanuel@seyman.fr wrote:
...
The problem is that I don't see any messages in the log that stand
out as being errors (granted, I'm not sure what I'm looking for).
In fact, the alert flaps every once in a while as the two nodes
come back in sync and drift away from each other again.
I find these values surprising considering I've never seen a syncrepl
error in the 2 years before the upgrade. Is there a known issue with
replication in 2.4.57 that would explain these sync differences?
The replication code in 2.4.44 was completely unreliable and could
report being in sync regardless of whether or not that was true.  It's
also unknown to me if the nagios plugin is accurate for the current
codebase.
Generally what you want to look at are the contextCSN values in the root
of the DIT of each server to see if they match.
My slapdcheck package [1] also implements exactly this check and
sometimes it shows a difference although the changes have been corrected
replicated (normal syncrepl).
You can look at the code to verify what it's doing:
https://gitlab.com/ae-dir/slapdcheck/-/blob/master/slapdcheck/__init__.py#L1...
(It reads the actual syncrepl providers from cn=config before comparing
the contextCSN values for each serverID.)
I discussed this several times with Howard and Ondrej but no idea came
up why that happens.
Ciao, Michael.
[1] https://www.stroeder.com/slapdcheck.html

2025

2024

2023

2022

2021

2020

2019

2018

2017

2016

2015

2014

2013

2012

2011

2010

2009

2008

2007

Re: difference in multi-master replication since 2.4.57