[Issue 9584] cn=config replication ops/refresh should pause server

25 Jan 2022


      https://bugs.openldap.org/show_bug.cgi?id=9584
Ondřej Kuzník ondra@mistotebe.net changed:
What    |Removed                     |Added
----------------------------------------------------------------------------
             Status|VERIFIED                    |CONFIRMED
     Ever confirmed|0                           |1
         Resolution|FIXED                       |---
--- Comment #12 from Ondřej Kuzník ondra@mistotebe.net ---
The current fix in 2.6 abuses retry logic to handle SYNC_BUSY and that's going
to cause us no end of trouble if we allow this into 2.5. Given it looks like we
want to backport, we'll need to make some changes I planned to do eventually.
My thinking is we make each syncinfo track whether it's active/paused, with all
of them starting in paused state and we keep the existing cs_refreshing
management as is. Also order probably matters but that just got fixed in
ITS#9761.
On startup, we kick off a task that picks the first paused syncinfo, marks it
active and schedules accordingly.
When a syncinfo gets a SYNC_BUSY - there is another active syncinfo in
cs_refreshing - it makes itself paused. When a syncinfo drops itself from
cs_refreshing it will pick the first paused syncinfo from the start and
activates it as above.
In this, it seems that syncinfos that run their retry counter to the end (going
dead) stay marked as "active" and that looks safe, we don't actually want to
reschedule them.
Any cn=config prodding has to take the above into account, so it might have to
schedule the kick-off task if required and some tracking to allow that will be
needed. There is no active link from cn=monitor that can change these
states/retries so that's fine. Decision whether or not to schedule the initial
task could be based on cs_refreshing == NULL. Even if we spawn duplicate
kick-off tasks, one of them will be able to take over cs_refreshing and others
will get BUSY or find no more paused syncinfos.
-- 
You are receiving this mail because:
You are on the CC list for the issue.

2025

2024

2023

2022

2021

2020

2019

2018

2017

2016

2015

2014

2013

2012

2011

2010

2009

2008

2007

2006

[Issue 9584] cn=config replication ops/refresh should pause server