quanah@zimbra.com wrote:
--On Thursday, June 09, 2016 1:19 AM +0100 Howard Chu hyc@symas.com wrote:
quanah@openldap.org wrote:
Full_Name: Quanah Gibson-Mount Version: 2.4.44 OS: Linux 2.6 URL: ftp://ftp.openldap.org/incoming/ Submission from: (NULL) (75.111.52.177)
In MMR node, when there is > 2 nodes, operations can get sent out endlessly.
For example, we see this modification occur at 20160603194926.427963Z
You seem to have a large clock sync problem.
Summary:
after fixing the clock skew, the problem was still present. Analyzing debug logs with sync+stats+packets, we see that the offending mods were propagated by syncprov without a CSN in the sync cookie. Since the cookie contained no CSN, the existing check for "CSN too old, ignoring" was not taking place, so the mods were not being filtered out as they should be.
syncprov sends mods out without a CSN in the cookie when the mod's CSN is older than the newest contextCSN. In this particular case, between the time that the provider processed the original mod, and the time it was queued up to be sent to the relevant consumers, this server's own consumers had received newer updates from other providers. So, the mod was older than the current contextCSN and was sent without a cookie CSN.
(The usual case for syncprov's behavior is when queued mods get sent out of order; since transmission order is not guaranteed to be the same as write/commit order this is a normal occurrence.)
It's possible that regular syncrepl+mmr needs a corresponding fix. I haven't looked at that yet.