--On Thursday, September 01, 2016 8:05 AM +0000 quanah@zimbra.com wrote:
--On Thursday, September 01, 2016 7:52 AM +0000 quanah@openldap.org wrote:
Full_Name: Quanah Gibson-Mount Version: OpenLDAP 2.4.44 OS: Linux 2.6 URL: ftp://ftp.openldap.org/incoming/ Submission from: (NULL) (75.111.52.177)
In a 2-node MMR setup. Node 1 is getting a lot of write traffic. Both node 1 and node 2 have 3 replicas each. At some point, a change is received by node 1, which writes the change to its accesslog DB and its primary DB. It's 3 replicas are all correctly updated. MMR node 2 receives the change, updates its primary DB, but *fails* to write the change to the accesslog DB. However, it *does* write the CSN update to the accesslog DB successfully. This causes all of its replicas to also update their CSN. Then a change comes in triggering a constraint violation on the replicas, but fully accepted by their master.
So the above summary is incorrect. While 3 replicas did go out of sync... 2 belonged to the primary master (node1), and 1 belonged to the secondary master (node 2). So really, 4 systems didn't log the change (MMR node 2, ldap05, ldap07, ldap09).
Ok, so that's not correct either. I now have the correct topography:
ldap01 has the following replicas: ldap02, ldap05, ldap07, ldap09 ldap02 has the following replicas: ldap01, ldap06, ldap08, ldap10
So the replicas of ldap01 received the change and rejected it. ldap02 just skipped writing the entry to the accesslog, and as a result, none of its replicas ever got the change, and thus they never hit the failure issue of err 19, but they all are now lacking this modification entirely.
I would note that every server was loaded today from the same ldap backup, so they were all perfectly in sync.
In looking at the LDAP accesslog, what I see is that what should have been a modRDN op was stored in the accesslog as a MOD op (the one I noted before). This seems particularly bizarre, because ldap01 should have rejected this change as well. It appears we may have a problem where the accesslog DB is updated, but then the change got rejected by the unique overlay.
--Quanah
--
Quanah Gibson-Mount