On Mon, Feb 27, 2023 at 15:32:39 +0100, Ondřej Kuzník wrote:
On Fri, Feb 24, 2023 at 11:08:26PM +0100, Geert Hendrickx wrote:
do_syncrep2: rid=xxx (4096) Content Sync Refresh Required
Hi Geert, I would start any investigation by comparing the contextCSNs between nodes (both DB and its accesslog). Also check the reason why the provider sends 4096.
We monitor and compare the contextCSN's continuously, that's how we noticed the replication was not continuous anymore, but in "bursts". It seems to reinitiate a full sync all the time (every 5 to 10 minutes), as long as new updates were coming in. It only got back to regular delta sync once we had a long enough period during the night with no updates.
What exactly is the meaning of the 4096 ?
Also not sure you need to touch accesslog so often, why not size your storage to deal with the extra capacity properly? Having a large freelist shouldn't be considered a problem in and of itself.
At least for the main db, it makes a significant performance difference if the accesslog gets too large. Therefor we mdb_copy -c the database from time to time. We do this on one server, then distribute this mdb to other servers and drop their accesslog, since it doesn't match the (imported) main db anymore. But then other replica's start logging "Content Sync Refresh Required" for the corresponding rid, even if no updates are coming in through *that* server, so its contextCSN is static.
Geert