On Mon, Feb 27, 2023 at 03:53:49PM +0100, Geert Hendrickx wrote:
We monitor and compare the contextCSN's continuously, that's how we noticed the replication was not continuous anymore, but in "bursts". It seems to reinitiate a full sync all the time (every 5 to 10 minutes), as long as new updates were coming in. It only got back to regular delta sync once we had a long enough period during the night with no updates.
Hi Geert, you didn't answer the questions whether you also monitor the accesslog's contextCSN? In deltasync, the combination of both is important.
What exactly is the meaning of the 4096 ?
See RFC 4533: "The server is REQUIRED to: ... c) indicate that the *incremental* convergence is not possible by returning e-syncRefreshRequired,"
My emphasis on "incremental". Usually when contextCSN and cookie are found to be incompatible (missing sids from cookie, or even tighter constraints when configured with syncprov-sessionlog-source), it tells the consumer to step down from deltasync or start without a cookie.
Also not sure you need to touch accesslog so often, why not size your storage to deal with the extra capacity properly? Having a large freelist shouldn't be considered a problem in and of itself.
At least for the main db, it makes a significant performance difference if the accesslog gets too large. Therefor we mdb_copy -c the database from time to time. We do this on one server, then distribute this mdb to other servers and drop their accesslog, since it doesn't match the (imported) main db anymore. But then other replica's start logging "Content Sync Refresh Required" for the corresponding rid, even if no updates are coming in through *that* server, so its contextCSN is static.
You mean accesslog DB or accesslog freelist? Also now you're saying you're obliterating the whole accesslog (and compacting the mainDB), where previously you said you were compacting accesslog.
If this is the case then yeah, you're removing every way the servers could have performed an efficient resync after reconnect/restart and that will take time and processing power to perform (probably running a refresh present, which is only one step up from a total resync). This makes little sense operationally.
Regards,