On Mon, Feb 27, 2023 at 19:18:38 +0100, Ondřej Kuzník wrote:
Hi Geert, you didn't answer the questions whether you also monitor the accesslog's contextCSN? In deltasync, the combination of both is important.
Ok, we don't. I'll take a look next time things are drifting.
In a stable environment, the accesslog's contextCSN is identical to the main db's contextCSN, for every SID.
What exactly is the meaning of the 4096 ?
See RFC 4533: "The server is REQUIRED to: ... c) indicate that the *incremental* convergence is not possible by returning e-syncRefreshRequired,"
My emphasis on "incremental". Usually when contextCSN and cookie are found to be incompatible (missing sids from cookie, or even tighter constraints when configured with syncprov-sessionlog-source), it tells the consumer to step down from deltasync or start without a cookie.
Ok, I assumed the consumer decides on its own if it's in sync with a given provider, by comparing its contextCSN to the provider's, and only if it's NOT in sync, query the provider's accesslog for delta sync from the CSN it's currently at, if possible.
At least for the main db, it makes a significant performance difference if the accesslog gets too large. Therefor we mdb_copy -c the database from time to time. We do this on one server, then distribute this mdb to other servers and drop their accesslog, since it doesn't match the (imported) main db anymore. But then other replica's start logging "Content Sync Refresh Required" for the corresponding rid, even if no updates are coming in through *that* server, so its contextCSN is static.
You mean accesslog DB or accesslog freelist? Also now you're saying you're obliterating the whole accesslog (and compacting the mainDB), where previously you said you were compacting accesslog.
We are compacting (mdb_copy -c) the main db on one server, AND throwing away the accesslog on other servers where we import this mdb, because it then no longer matches the local accesslog.
Context at https://openldap.org/lists/openldap-technical/201708/msg00049.html (although this was about a different LDAP database than the one we're currently talking about.)
This always went fine, but now turns out to confuse other consumers in an MMR environment. Should we instead run mdb_copy -c locally on each server? (this can be a pretty slow operation) Or is there another "clean" way to copy mdb databases between replica's? Include the corresponding accesslog?
The other scenario was that after large batch updates, when the accesslog has grown much bigger than usual (which is not a problem in itself), after logpurge this leaves a large freelist in the accesslog as well. So as a precaution, we "clean up" here as well by just dropping that accesslog - obviously at a quite time and when all servers are in sync. This turned out to be a mistake.
If this is the case then yeah, you're removing every way the servers could have performed an efficient resync after reconnect/restart and that will take time and processing power to perform (probably running a refresh present, which is only one step up from a total resync). This makes little sense operationally.
Ok, so far we only looked at the contextCSN of the main DIT, assuming this told the whole story.
Geert