On Mon, Feb 27, 2023 at 08:12:44PM +0100, Geert Hendrickx wrote:
On Mon, Feb 27, 2023 at 19:18:38 +0100, Ondřej Kuzník wrote:
> Hi Geert,
> you didn't answer the questions whether you also monitor the accesslog's
> contextCSN? In deltasync, the combination of both is important.
Ok, we don't. I'll take a look next time things are drifting.
In a stable environment, the accesslog's contextCSN is identical to the
main db's contextCSN, for every SID.
Hi Geert,
yes, accesslog's contextCSN should always be in sync with its main DB.
> My emphasis on "incremental". Usually when contextCSN
and cookie are
> found to be incompatible (missing sids from cookie, or even tighter
> constraints when configured with syncprov-sessionlog-source), it tells
> the consumer to step down from deltasync or start without a cookie.
Ok, I assumed the consumer decides on its own if it's in sync with a given
provider, by comparing its contextCSN to the provider's, and only if it's
NOT in sync, query the provider's accesslog for delta sync from the CSN
it's currently at, if possible.
Except for initial sync (no data in consumer), the consumer always tries
deltasync first. Provider then proceeds accordingly or tells the
consumer to fall back to plain syncrepl (the most common reason to see
e-syncRefreshRequired - 4096).
>> At least for the main db, it makes a significant performance
difference if
>> the accesslog gets too large. Therefor we mdb_copy -c the database from
>> time to time. We do this on one server, then distribute this mdb to other
>> servers and drop their accesslog, since it doesn't match the (imported)
>> main db anymore. But then other replica's start logging "Content Sync
>> Refresh Required" for the corresponding rid, even if no updates are coming
>> in through *that* server, so its contextCSN is static.
>
> You mean accesslog DB or accesslog freelist? Also now you're saying
> you're obliterating the whole accesslog (and compacting the mainDB),
> where previously you said you were compacting accesslog.
We are compacting (mdb_copy -c) the main db on one server, AND throwing
away the accesslog on other servers where we import this mdb, because it
then no longer matches the local accesslog.
Context at
https://openldap.org/lists/openldap-technical/201708/msg00049.html
(although this was about a different LDAP database than the one we're
currently talking about.)
Unless your entries are larger than pagesize *and* you have massive
churn on those, you don't want to do this. Are you confident that's the
case? What is your number of overflow pages? What kind of entries is
it down to? If it's entries with large number of values in an attribute
(e.g. groups), you might also want to look into sortvals (see man 5
slapd.conf) and multival (man 5 slapd-mdb) to store them more
efficiently.
This always went fine, but now turns out to confuse other consumers
in an
MMR environment. Should we instead run mdb_copy -c locally on each server?
(this can be a pretty slow operation) Or is there another "clean" way to
copy mdb databases between replica's? Include the corresponding accesslog?
It should be safe to include the accesslog *if* server was shut down
cleanly and everything was flushed into both.
Do you configure persistent or in-memory sessionlog?
The other scenario was that after large batch updates, when the
accesslog
has grown much bigger than usual (which is not a problem in itself), after
logpurge this leaves a large freelist in the accesslog as well. So as a
precaution, we "clean up" here as well by just dropping that accesslog -
obviously at a quite time and when all servers are in sync. This turned
out to be a mistake.
Are your accesslog entries so large that they don't fit a page? If not,
just let the freelist be reused for the next time you have a large batch
of updates again. That's what it's there for. And even then, accesslog
in particular shouldn't really suffer from fragmentation as much as the
main DB would.
> If this is the case then yeah, you're removing every way the
servers
> could have performed an efficient resync after reconnect/restart and
> that will take time and processing power to perform (probably running a
> refresh present, which is only one step up from a total resync). This
> makes little sense operationally.
Ok, so far we only looked at the contextCSN of the main DIT, assuming this
told the whole story.
Yeah, in δ-multiprovider both main DB and accesslog (and their
contextCSNs) are used together and should be monitored as such.
Regards,
--
Ondřej Kuzník
Senior Software Engineer
Symas Corporation
http://www.symas.com
Packaged, certified, and supported LDAP solutions powered by OpenLDAP