After having deployed delta-sync MMR at several customer sites, the general
handling of conflict resolution in MMR mode is significantly sub optimal,
and routinely causes the MMR nodes to get further out of sync, worsening
things significantly (Mainly due to ITS#8125).
The main issues I see are the following:
a) Two masters get different change requests at approximately the same time
to add a value X to an attribute.
b) Two masters get different change requests at approximately the same time
to delete a value X from an attribute.
In these two specific cases, in relaxed mode, rather than falling back and
re-syncing the entire database, I think the conflict should be discarded
(skipped), and logged as such. I.e., there is no actual discrepancy in the
object. It still has X present in the add case, and X gone in the delete
case.
At best, if we're going to do fallback, then we should only see about
resyncing the specific entry. The overall behavior I'm seeing from
OpenLDAP is the masters get in an endless cycle of re-sync, and the more
they do so, the more out of sync they become, leading to a point at which
you have to stop all masters, export all their DBs, sort them, find missing
entries between all sets of masters, and build a brand new DB with which to
reload them, until they get massively out of sync again. I.e., the current
strategy of resync is doing no favors to anyone. It may work OK on very
small DBs, where a resync only takes seconds, but on larger dbs were such
syncs take 30+ minutes to hours, it is not a useful methodology.
--Quanah
--
Quanah Gibson-Mount
Platform Architect
Zimbra, Inc.
--------------------
Zimbra :: the leader in open source messaging and collaboration