Re: Content Sync Refresh Required

28 Feb 2023


      On Tue, Feb 28, 2023 at 16:12:25 +0100, Ondřej Kuzník wrote:
...
On Tue, Feb 28, 2023 at 01:42:20PM +0100, Geert Hendrickx wrote:
...
We've had (and still have) this issue with large attributes and large 
multi-valued attributes with Zimbra (see previous discussion with Quanah),
where we applied sortvals and multival.  But in this scenario it's not the
case; all objects are of similar small size, with (mostly) single valued
attributes.  Yet our freelist reaches 200K+ free pages during periods with
heavy updates (mostly deletes/adds), which has a measurable impact on write
performance.
Hi Geert,
are you sure it's the freelist and not the random access as pages become
non-contiguous? The former would represent a constant decline in
performance where the latter would eventually taper from high (best
case) performance to regular performance you should be able to expect?
Have you been able to rule that out?
mdb_copy -c fixes it, so I assume it's only the freelist size, not actual
fragmentation (mdb_copy doesn't reorder any data, right?).
Random access shouldn't matter much, as it's all on an SSD-based SAN.
Also, the decline isn't constant.  In normal operations, the freelist stays
fairly small (it is "consumed" all the time by regular updates).  Only
during batch updates (because of a currently ongoing migration) it explodes
and doesn't get "consumed" in time for the next batch update, and causes
performance degradation for subsequent batches.
...
After you kill accesslog, you disable deltasync. Since you're also
restarting, the provider has no data on how to replay anything and needs
to send the list of all entries (at least their UUIDs). This is
expensive and slow. Replication seems to proceed in slow leaps that cost
a *lot* of processing on the provider and a fair amount of bandwidth.
Isn't that what you're seeing?
Yes, this is indeed the case and it keeps doing that as long as updates are
coming in.  Once there are no updates for a full refresh cycle (eg. during
the night, or because we pause updates) it is able to revert to delta sync.
...
After you kill accesslog, you disable deltasync.
This is the essential part.  I always assumed it could proceed with
deltasync of the provider and replica have the same contextCSN, even with
an empty accesslog.
This probably went un-noticed for a long time since dropping the accesslog
on a non-active master causes no (visible) delays.  Only on an active master.
Thanks for your insights, things are much clearer now, and we have adjusted
our processes accordingly.
Geert

2025

2024

2023

2022

2021

2020

2019

2018

2017

2016

2015

2014

2013

2012

2011

2010

2009

2008

2007

Re: Content Sync Refresh Required