There's a general issue with using delta-sync MMR that needs some resolution (It's causing some considerable pain for customers): It is possible to put the entire system into endless fallbacks until a new and/or reloaded node gets a write operation. The general problem is that when an existing master queries the new/reloaded master's accesslog DB, zero entries are returned. This then triggers the fallback. This happens up util such a time as the new/reloaded master gets a direct write op. I've worked around it in general by immediately doing a no-op on the primary db (ldapmodify/replace an attribute value with its own value), but it would be nice to be able to bring new MMR nodes online or be able to reload MMR nodes for if they get out of sync, etc, without causing this sync fallback issue. There is no clear solution at the moment on what to do about zero results from the accesslog in the delta-sync MMR scenario. Proposals welcome. ;)
--Quanah
--
Quanah Gibson-Mount Platform Architect Zimbra, Inc. -------------------- Zimbra :: the leader in open source messaging and collaboration
--On Wednesday, April 08, 2015 10:50 PM -0700 Quanah Gibson-Mount quanah@zimbra.com wrote:
There's a general issue with using delta-sync MMR that needs some resolution (It's causing some considerable pain for customers): It is possible to put the entire system into endless fallbacks until a new and/or reloaded node gets a write operation. The general problem is that when an existing master queries the new/reloaded master's accesslog DB, zero entries are returned. This then triggers the fallback. This happens up util such a time as the new/reloaded master gets a direct write op. I've worked around it in general by immediately doing a no-op on the primary db (ldapmodify/replace an attribute value with its own value), but it would be nice to be able to bring new MMR nodes online or be able to reload MMR nodes for if they get out of sync, etc, without causing this sync fallback issue. There is no clear solution at the moment on what to do about zero results from the accesslog in the delta-sync MMR scenario. Proposals welcome. ;)
One possibility that came to mind today was essentially treating this scenario as similar to ldap down, and simply back off along the lines of the retry etc configuration bits in syncrepl. This would keep the system from going nuts if another pair doesn't have an accesslog DB mod op yet, and let it pick up changes once they are available. Thoughts?
--Quanah
--
Quanah Gibson-Mount Platform Architect Zimbra, Inc. -------------------- Zimbra :: the leader in open source messaging and collaboration