https://bugs.openldap.org/show_bug.cgi?id=9584
Ondřej Kuzník ondra@mistotebe.net changed:
What |Removed |Added ---------------------------------------------------------------------------- Status|VERIFIED |CONFIRMED Ever confirmed|0 |1 Resolution|FIXED |---
--- Comment #12 from Ondřej Kuzník ondra@mistotebe.net --- The current fix in 2.6 abuses retry logic to handle SYNC_BUSY and that's going to cause us no end of trouble if we allow this into 2.5. Given it looks like we want to backport, we'll need to make some changes I planned to do eventually.
My thinking is we make each syncinfo track whether it's active/paused, with all of them starting in paused state and we keep the existing cs_refreshing management as is. Also order probably matters but that just got fixed in ITS#9761.
On startup, we kick off a task that picks the first paused syncinfo, marks it active and schedules accordingly.
When a syncinfo gets a SYNC_BUSY - there is another active syncinfo in cs_refreshing - it makes itself paused. When a syncinfo drops itself from cs_refreshing it will pick the first paused syncinfo from the start and activates it as above.
In this, it seems that syncinfos that run their retry counter to the end (going dead) stay marked as "active" and that looks safe, we don't actually want to reschedule them.
Any cn=config prodding has to take the above into account, so it might have to schedule the kick-off task if required and some tracking to allow that will be needed. There is no active link from cn=monitor that can change these states/retries so that's fine. Decision whether or not to schedule the initial task could be based on cs_refreshing == NULL. Even if we spawn duplicate kick-off tasks, one of them will be able to take over cs_refreshing and others will get BUSY or find no more paused syncinfos.