https://bugs.openldap.org/show_bug.cgi?id=9823
Issue ID: 9823 Summary: syncprov doesn't fallback when deltasync consumer's offline beyond accesslog depth Product: OpenLDAP Version: 2.6.1 Hardware: All OS: All Status: UNCONFIRMED Keywords: needs_review Severity: normal Priority: --- Component: slapd Assignee: bugs@openldap.org Reporter: smckinney@symas.com Target Milestone: ---
Configured w/ deltasync. When a consumer goes offline for a duration exceeding the the logpurge interval, won't fallback into syncrepl, resulting in a dsync.
https://bugs.openldap.org/show_bug.cgi?id=9823
--- Comment #1 from Shawn McKinney smckinney@symas.com --- Created attachment 893 --> https://bugs.openldap.org/attachment.cgi?id=893&action=edit provider slapd.conf
https://bugs.openldap.org/show_bug.cgi?id=9823
--- Comment #2 from Shawn McKinney smckinney@symas.com --- Created attachment 894 --> https://bugs.openldap.org/attachment.cgi?id=894&action=edit consumer slapd.conf
https://bugs.openldap.org/show_bug.cgi?id=9823
--- Comment #3 from Shawn McKinney smckinney@symas.com --- # Instructions to reproduce
1. Use openldap_version: 'OPENLDAP_REL_ENG_2_6' 2. Setup one provider, one consumer, delta sync repl (conf attached) 3. Set on provider: logpurge 00+00:02 00+00:01 4. Add some records (batch #1) 5. Stop the consumer 6. Add some more records (batch #2) 7. Wait 3 minutes 8. Start the consumer 9. Measure entry count. Consumer won't receive the 2nd batch of records
https://bugs.openldap.org/show_bug.cgi?id=9823
--- Comment #4 from Shawn McKinney smckinney@symas.com --- Note: Same behavior applies when consumer is also a provider (i.e. multi-provider). If a delta sync consumer's offline for longer than the purge interval of its providers, it won't receive the updates corresponding with those purged records.
The question, why doesn't it fallback into plain sync repl? Or, given some indication to the consumer (error) that it can't be brought back in sync, i.e. dsync has occurred.
https://bugs.openldap.org/show_bug.cgi?id=9823
--- Comment #5 from Ondřej Kuzník ondra@mistotebe.net --- On Mon, Apr 18, 2022 at 06:49:36PM +0000, openldap-its@openldap.org wrote:
--- Comment #4 from Shawn McKinney smckinney@symas.com --- The question, why doesn't it fallback into plain sync repl? Or, given some indication to the consumer (error) that it can't be brought back in sync, i.e. dsync has occurred.
Syncprov just lets the consumer replicate the current contents of the database (minus any deletions because syncprov-nopresent is set). It has no idea that deletes happened (there is no record of them) and how it all fits into the semantics of delta-syncrepl.
We could teach syncprov about minCSN (as maintained by slapo-accesslog) when nopresent is set but then we should really rename the parameter to something else, more in line with the intended usage.
Another thing to keep in mind if we go that route is that minCSN would now have two slightly different uses: - as an indication of how useful the log is as a source of a refresh delete phase - as an indication whether the accesslog is useful as a replication log for deltasync
A much tighter set of assumptions is associated with the latter. In general, whenever a main DB runs a plain refresh, this changes what part of the accesslog is usable as a deltasync source[0] while its usefulness to serve as a sessionlog source is unaffected.
[0]. A plain refresh destroys ordering information so anything before it has finished is suspect for deltasync. Currently we ignore that, see ITS#9580 for more background
https://bugs.openldap.org/show_bug.cgi?id=9823
Ondřej Kuzník ondra@mistotebe.net changed:
What |Removed |Added ---------------------------------------------------------------------------- See Also| |https://bugs.openldap.org/s | |how_bug.cgi?id=9580
https://bugs.openldap.org/show_bug.cgi?id=9823
--- Comment #6 from Howard Chu hyc@openldap.org --- (In reply to Ondřej Kuzník from comment #5)
On Mon, Apr 18, 2022 at 06:49:36PM +0000, openldap-its@openldap.org wrote:
--- Comment #4 from Shawn McKinney smckinney@symas.com --- The question, why doesn't it fallback into plain sync repl? Or, given some indication to the consumer (error) that it can't be brought back in sync, i.e. dsync has occurred.
Syncprov just lets the consumer replicate the current contents of the database (minus any deletions because syncprov-nopresent is set). It has no idea that deletes happened (there is no record of them) and how it all fits into the semantics of delta-syncrepl.
Deletes are irrelevant when polling the log. The log is a queue, appends at the tail and deletes from the head. The only check that's required is to see if the consumer's cookieCSNs are still present in the log. If not, then records that cover the cookie are gone, and a refresh from the mainDB is needed. That's why we use the nopresent config on the logDB, because a normal present phase is a bunch of work for no extra benefit.
https://bugs.openldap.org/show_bug.cgi?id=9823
--- Comment #7 from Ondřej Kuzník ondra@mistotebe.net --- On Thu, Apr 21, 2022 at 12:50:34PM +0000, openldap-its@openldap.org wrote:
Deletes are irrelevant when polling the log. The log is a queue, appends at the tail and deletes from the head. The only check that's required is to see if the consumer's cookieCSNs are still present in the log. If not, then records that cover the cookie are gone, and a refresh from the mainDB is needed. That's why we use the nopresent config on the logDB, because a normal present phase is a bunch of work for no extra benefit.
This is false, imagine a multi-sid environment, provider in question is A, other providers include B and C: - replica X disconnects - sid B and C recieve new write operations - sid B operations reach A in a timely manner - sid C CSNs are significantly delayed in reaching A - logpurge kicks in purging some sid B operations (sid C operations older than these are retained) - replica X reconnects to A, sid C csn is chosed as the older CSN for some reason, it is found in accesslog, replication continues (with the same effect as decscribed in this issue)
I agree nopresent is important for efficient deltasync operation. Just suggested there is no other use of this configuration option than on a logDB and we can conflate this in a proposed behavioural change.
https://bugs.openldap.org/show_bug.cgi?id=9823
Quanah Gibson-Mount quanah@openldap.org changed:
What |Removed |Added ---------------------------------------------------------------------------- Target Milestone|--- |2.5.13 Keywords|needs_review |
https://bugs.openldap.org/show_bug.cgi?id=9823
Quanah Gibson-Mount quanah@openldap.org changed:
What |Removed |Added ---------------------------------------------------------------------------- Assignee|bugs@openldap.org |ondra@mistotebe.net
https://bugs.openldap.org/show_bug.cgi?id=9823
--- Comment #8 from Quanah Gibson-Mount quanah@openldap.org --- https://git.openldap.org/openldap/openldap/-/merge_requests/521
https://bugs.openldap.org/show_bug.cgi?id=9823
--- Comment #9 from Dimitar Stoychev dstoychev@symas.com --- The proposed changes are derived from OpenLDAP Software. All of the modifications to OpenLDAP Software represented in the following changes were developed by Symas Corporation. Symas Corporation has not assigned rights and/or interest in this work to any party. I, Dimitar Stoychev, am authorized by Symas Corporation, my employer, to release this work under the following terms.
Copyright 2022 Symas Corporation Redistribution and use in source and binary forms, with or without modification, are permitted only as authorized by the OpenLDAP Public License.
https://bugs.openldap.org/show_bug.cgi?id=9823
Quanah Gibson-Mount quanah@openldap.org changed:
What |Removed |Added ---------------------------------------------------------------------------- Status|UNCONFIRMED |RESOLVED Resolution|--- |FIXED
--- Comment #10 from Quanah Gibson-Mount quanah@openldap.org --- head:
• 69de6c94 by Dimitar Stoychev at 2022-06-21T16:21:56+00:00 ITS#9823 Update test043 to check deltasync recovery after accesslog has been purged
• c64e6635 by Ondřej Kuzník at 2022-06-21T16:21:56+00:00 ITS#9823 Check minCSN when setting up delta-log replay
RE26:
• e56e70b4 by Dimitar Stoychev at 2022-06-23T18:42:54+00:00 ITS#9823 Update test043 to check deltasync recovery after accesslog has been purged
• eea9b838 by Ondřej Kuzník at 2022-06-23T18:42:59+00:00 ITS#9823 Check minCSN when setting up delta-log replay
RE25:
• ff15ef02 by Dimitar Stoychev at 2022-06-23T18:49:19+00:00 ITS#9823 Update test043 to check deltasync recovery after accesslog has been purged
• f674fbee by Ondřej Kuzník at 2022-06-23T18:49:23+00:00 ITS#9823 Check minCSN when setting up delta-log replay
https://bugs.openldap.org/show_bug.cgi?id=9823
Ondřej Kuzník ondra@mistotebe.net changed:
What |Removed |Added ---------------------------------------------------------------------------- See Also| |https://bugs.openldap.org/s | |how_bug.cgi?id=9878
https://bugs.openldap.org/show_bug.cgi?id=9823
--- Comment #11 from Quanah Gibson-Mount quanah@openldap.org --- head:
• 207604c0 by Ondřej Kuzník at 2022-07-07T21:31:03+01:00 ITS#9823 Only request minCSN if accesslog is around
RE26:
• 23ef018c by Ondřej Kuzník at 2022-07-07T21:24:38+00:00 ITS#9823 Only request minCSN if accesslog is around
RE25:
• fc812cdb by Ondřej Kuzník at 2022-07-07T21:25:02+00:00 ITS#9823 Only request minCSN if accesslog is around
https://bugs.openldap.org/show_bug.cgi?id=9823
Quanah Gibson-Mount quanah@openldap.org changed:
What |Removed |Added ---------------------------------------------------------------------------- Status|RESOLVED |VERIFIED
https://bugs.openldap.org/show_bug.cgi?id=9823
--- Comment #12 from Quanah Gibson-Mount quanah@openldap.org --- head:
• 7ade966c by Ondřej Kuzník at 2024-02-05T22:57:17+00:00 ITS#9823 Move to a place that is better associated with accesslog
RE26:
• fe7ee150 by Ondřej Kuzník at 2024-02-15T17:55:09+00:00 ITS#9823 Move to a place that is better associated with accesslog
RE25:
• c4a8fce7 by Ondřej Kuzník at 2024-02-15T17:55:05+00:00 ITS#9823 Move to a place that is better associated with accesslog