Re: (ITS#5985) replication lockout with syncrepl

3 Mar 2009


      quanah@OpenLDAP.org wrote:
...
Full_Name: Quanah Gibson-Mount
Version: 2.3/2.4/HEAD
OS: Linux 2.6
URL: ftp://ftp.openldap.org/incoming/
Submission from: (NULL) (75.111.29.239)
I noticed back in testing with OpenLDAP 2.3 that if a master gets a high rate of
changes, and you have 3+ replicas, usually 2 replicas will end up getting all of
the changes while the 3rd+ replicas have to wait until those 2 finish before
getting changes.  If the high rate of changes goes on for a long enough period
of time, this can cause the other replicas to get so far out of sync that it is
more efficient to reload them than to wait on them to re-sync.  I discussed this
with Howard, and in reviewing the code, he sees there's an underlying design
issue with updates that is causing this.  His comments:
Once a thread for a psearch wakes up, it sends all the changes that were queued
so it may hog an entire thread for a long time before the next psearch comes off
the queue
Fixing this issue would require a complete redesign of the psearch queue 
handling. Instead of queuing up a separate response per psearch, there should 
be a single queue of responses, and the qplayer should iterate thru to match a 
response to each of the active psearches. That would guarantee that all 
replicas receive a given change before any of them receives the next change. 
This would also help with the ordering issues discussed recently on -technical 
and -devel.
I suspect this is too big a change to target the next (.16) release, since 
we're focusing on re-stabilizing the code right now.
-- 
   -- Howard Chu
   CTO, Symas Corp.           http://www.symas.com
   Director, Highland Sun     http://highlandsun.com/hyc/
   Chief Architect, OpenLDAP  http://www.openldap.org/project/

2025

2024

2023

2022

2021

2020

2019

2018

2017

2016

2015

2014

2013

2012

2011

2010

2009

2008

2007

2006

Re: (ITS#5985) replication lockout with syncrepl