[Issue 10274] New: Replication issue on MMR configuration

22 Oct 2024


      https://bugs.openldap.org/show_bug.cgi?id=10274
Issue ID: 10274
           Summary: Replication issue on MMR configuration
           Product: OpenLDAP
           Version: 2.5.14
          Hardware: All
                OS: Linux
            Status: UNCONFIRMED
          Keywords: needs_review
          Severity: normal
          Priority: ---
         Component: slapd
          Assignee: bugs@openldap.org
          Reporter: falgon.comp@gmail.com
  Target Milestone: ---
Created attachment 1036
  --> https://bugs.openldap.org/attachment.cgi?id=1036&action=edit
In this attachment you will find 2 openldap configurations for 2 instances +
slamd conf exemple + 5 screenshots to show the issue and one text file to
explain what you see
Hello we are openning this issue further to the initial post in technical :
https://lists.openldap.org/hyperkitty/list/openldap-technical@openldap.org/t...
Issue :
We are working on a project and we've come across an issue with the replication
after performance testing :
*Configuration :*
RHEL 8.6
OpenLDAP 2.5.14
*MMR-delta *configuration on multiple servers attached
300,000 users configured and used for tests
*olcLastBind: TRUE*
Use of SLAMD (performance shooting)
*Problem description:*
We are currently running performance and resilience tests on our infrastructure
using the SLAMD tool configured to perform BINDs and MODs on a defined range of
accounts.
We use a load balancer (VIP) to poll all of our servers equally. (but it is
possible to do performance tests directly on each of the directories)
With our current infrastructure we're able to perform approximately 300
MOD/BIND/s. Beyond that, we start to generate delays and can randomly come
across one issue.
However, when we run performance tests that exceed our write capacity, our
replication between servers can randomly create an incident with directories
being unable to catch up with their replication delay.
The directories update their contextCSNs, but extremely slowly (like freezing).
From then on, it's impossible for the directories to catch again. (even with no
incoming traffic)
A restart of the instance is required to perform a full refresh and solve the
incident.
We have enabled synchronization logs and have no error or refresh logs to
indicate a problem ( we can provide you with logs if necessary).
We suspect a write collision or a replication conflict but this is never write
in our sync logs.
We've run a lot of tests.
For example, when we run a performance test on a single live server, we don't
reproduce the problem.
Anothers examples: if we define different accounts ranges for each server with
SALMD, we don't reproduce the problem either.
If we use only one account for the range, we don't reproduce the problem
either.
______________________________________________________________________
I have add some screenshots on attachement to show you the issue and all the
explanations.
______________________________________________________________________
*Symptoms :*
One or more directories can no longer be replicated normally after performance
testing ends.
No apparent error logs.
Need a restart of instances to solve the problem.
*How to reproduce the problem:*
Have at least two servers in MMR mode
Set LastBind to TRUE
Perform a SLAMD shot from a LoadBalancer in bandwidth mode OR start multiple
SLAMD test on same time for each server with the same account range.
Exceed the maximum write capacity of the servers.
*SLAMD configuration :*
authrate.sh --hostname ${HOSTNAME} --port ${PORTSSL} \
--useSSL --trustStorePath ${CACERTJKS} \
--trustStorePassword ${CACERTJKSPW} --bindDN "${BINDDN}" \
--bindPassword ${BINDPW} --baseDN "${BASEDN}" \
--filter "(uid=[${RANGE}])" --credentials ${USERPW} \
--warmUpIntervals ${WARMUP} \
--numThreads ${NTHREADS} ${ARGS}
-- 
You are receiving this mail because:
You are on the CC list for the issue.

2026

2025

2024

2023

2022

2021

2020

2019

2018

2017

2016

2015

2014

2013

2012

2011

2010

2009

2008

2007

2006

[Issue 10274] New: Replication issue on MMR configuration