Full_Name: Quanah Gibson-Mount Version: 2.4.45 OS: N/A URL: ftp://ftp.openldap.org/incoming/ Submission from: (NULL) (47.208.148.239)
In a N-Way MMR setup, one node falling back to syncrepl REFRESH may destabilize other nodes, as it will incorrectly record changes it is receiving from the master that has write ops. If another node is using this master for its source, it will then be forced into fallback as well, further destabilizing the cluster.
In the scenario above, triggered this via a 4-way MMR setup. serverID 1 master was the only master that had write ops. serverid 3 went into REFRESH for unknown reasons. serverid 2 was using serverid3 to pull in the changes for serverid 1, for unknown reasons.
serverid 1 recorded the following change:
dn: reqStart=20171206214129.000002Z,cn=accesslog objectClass: auditModify structuralObjectClass: auditModify reqStart: 20171206214129.000002Z reqEnd: 20171206214129.000003Z reqType: modify reqSession: 2209 reqAuthzID: cn=ldaproot,dc=xxx,dc=edu reqDN: uid=cdxxxxx,ou=user,dc=xxx,dc=edu reqResult: 0 reqMod: pwdAccountLockedTime:= 20171206214129Z reqMod: pwdFailureTime:+ 20171206214129.121729Z reqMod: entryCSN:= 20171206214129.121794Z#000000#001#000000 reqMod: modifiersName:= cn=ldaproot,dc=xxx,dc=edu reqMod: modifyTimestamp:= 20171206214129Z reqEntryUUID: 41e02340-18f9-1027-900a-8ac8742d2008 entryUUID: f66c9be6-6f19-1037-9821-a35b555092e3 creatorsName: cn=accesslog createTimestamp: 20171206214129Z entryCSN: 20171206214129.121794Z#000000#001#000000 modifiersName: cn=accesslog modifyTimestamp: 20171206214129Z
However, serverID 3 records the following for this same CSN instead:
dn: reqStart=20171206214354.000006Z,cn=accesslog objectClass: auditModify structuralObjectClass: auditModify reqStart: 20171206214354.000006Z reqEnd: 20171206214354.000008Z reqType: modify reqSession: 1 reqAuthzID: cn=ldaproot,dc=xxx,dc=edu reqDN: uid=cdxxxxx,ou=user,dc=xxx,dc=edu reqResult: 0 reqMod: userPassword:- reqMod: pwdFailureTime:+ 20171206214105.339881Z reqMod: pwdFailureTime:+ 20171206214105.504629Z reqMod: pwdFailureTime:+ 20171206214105.756105Z reqMod: pwdFailureTime:+ 20171206214106.117063Z reqMod: pwdFailureTime:+ 20171206214106.348441Z reqMod: pwdFailureTime:+ 20171206214106.575907Z reqMod: pwdFailureTime:+ 20171206214106.875082Z reqMod: pwdFailureTime:+ 20171206214107.175699Z reqMod: pwdFailureTime:+ 20171206214107.655344Z reqMod: pwdFailureTime:+ 20171206214107.915930Z reqMod: pwdFailureTime:+ 20171206214108.156601Z reqMod: pwdFailureTime:+ 20171206214108.431242Z reqMod: pwdFailureTime:+ 20171206214108.791469Z reqMod: pwdFailureTime:+ 20171206214109.033924Z reqMod: pwdFailureTime:+ 20171206214109.318285Z reqMod: pwdFailureTime:+ 20171206214109.565585Z reqMod: pwdFailureTime:+ 20171206214109.823744Z reqMod: pwdFailureTime:+ 20171206214110.110372Z reqMod: pwdFailureTime:+ 20171206214110.306955Z reqMod: pwdFailureTime:+ 20171206214110.638527Z reqMod: pwdFailureTime:+ 20171206214111.014705Z reqMod: pwdFailureTime:+ 20171206214111.370965Z reqMod: pwdFailureTime:+ 20171206214111.673694Z reqMod: pwdFailureTime:+ 20171206214112.011806Z reqMod: pwdFailureTime:+ 20171206214112.327727Z reqMod: pwdFailureTime:+ 20171206214112.584305Z reqMod: pwdFailureTime:+ 20171206214112.930555Z reqMod: pwdFailureTime:+ 20171206214113.269235Z reqMod: pwdFailureTime:+ 20171206214113.633844Z reqMod: pwdFailureTime:+ 20171206214113.928111Z reqMod: pwdFailureTime:+ 20171206214114.217342Z reqMod: pwdFailureTime:+ 20171206214114.539026Z reqMod: pwdFailureTime:+ 20171206214114.888149Z reqMod: pwdFailureTime:+ 20171206214115.262042Z reqMod: pwdFailureTime:+ 20171206214115.675217Z reqMod: pwdFailureTime:+ 20171206214116.030024Z reqMod: pwdFailureTime:+ 20171206214116.362739Z reqMod: pwdFailureTime:+ 20171206214116.616784Z reqMod: pwdFailureTime:+ 20171206214116.987779Z reqMod: pwdFailureTime:+ 20171206214117.293091Z reqMod: pwdFailureTime:+ 20171206214117.549392Z reqMod: pwdFailureTime:+ 20171206214117.838969Z reqMod: pwdFailureTime:+ 20171206214118.051355Z reqMod: pwdFailureTime:+ 20171206214118.275629Z reqMod: pwdFailureTime:+ 20171206214118.583510Z reqMod: pwdFailureTime:+ 20171206214118.866746Z reqMod: pwdFailureTime:+ 20171206214119.174928Z reqMod: pwdFailureTime:+ 20171206214119.483218Z reqMod: pwdFailureTime:+ 20171206214119.929568Z reqMod: pwdFailureTime:+ 20171206214120.147090Z reqMod: pwdFailureTime:+ 20171206214120.549317Z reqMod: pwdFailureTime:+ 20171206214120.869798Z reqMod: pwdFailureTime:+ 20171206214121.143126Z reqMod: pwdFailureTime:+ 20171206214121.476740Z reqMod: pwdFailureTime:+ 20171206214121.799935Z reqMod: pwdFailureTime:+ 20171206214122.066816Z reqMod: pwdFailureTime:+ 20171206214122.405710Z reqMod: pwdFailureTime:+ 20171206214122.761880Z reqMod: pwdFailureTime:+ 20171206214123.032806Z reqMod: pwdFailureTime:+ 20171206214123.280540Z reqMod: pwdFailureTime:+ 20171206214123.748973Z reqMod: pwdFailureTime:+ 20171206214124.085579Z reqMod: pwdFailureTime:+ 20171206214124.340470Z reqMod: pwdFailureTime:+ 20171206214124.638673Z reqMod: pwdFailureTime:+ 20171206214124.970374Z reqMod: pwdFailureTime:+ 20171206214125.302162Z reqMod: pwdFailureTime:+ 20171206214125.630451Z reqMod: pwdFailureTime:+ 20171206214125.921736Z reqMod: pwdFailureTime:+ 20171206214126.232407Z reqMod: pwdFailureTime:+ 20171206214126.564006Z reqMod: pwdFailureTime:+ 20171206214126.816303Z reqMod: pwdFailureTime:+ 20171206214127.168459Z reqMod: pwdFailureTime:+ 20171206214127.481267Z reqMod: pwdFailureTime:+ 20171206214127.779584Z reqMod: pwdFailureTime:+ 20171206214128.176611Z reqMod: pwdFailureTime:+ 20171206214128.429982Z reqMod: pwdFailureTime:+ 20171206214128.852280Z reqMod: pwdFailureTime:+ 20171206214129.121729Z reqMod: pwdAccountLockedTime:= 20171206214129Z reqMod: entryCSN:= 20171206214129.121794Z#000000#001#000000 reqMod: modifiersName:= cn=ldaproot,dc=xxx,dc=edu reqMod: modifyTimestamp:= 20171206214129Z reqMod: pwdFailureTime:- reqEntryUUID: 41e02340-18f9-1027-900a-8ac8742d2008 entryCSN: 20171206214129.121794Z#000000#001#000000 entryUUID: 4d10f56e-6f1a-1037-9a98-f5e2e6dad8c2 creatorsName: cn=accesslog createTimestamp: 20171206214129Z modifiersName: cn=accesslog modifyTimestamp: 20171206214129Z
serverID2 is then unable to process the change for this CSN provided by serverID3, and goes into refresh mode.