Hello everyone,
I have been testing n-way multimaster replication with OpenLDAP for a while now (from 2.4.11, to 2.4.15) and just when I though that everything was working perfectly, I dicided to test N-way multimaster not only with 2 masters on different servers, but with 4 !
2 OpenLDAP instances per server.
I have been configuring syncprov and syncrepl accordingly : olcServerID: 1 ldap://163.106.38.90:9011/ olcServerID: 2 ldap://163.106.38.92:9012/ olcServerID: 3 ldap://163.106.38.90:9013/ olcServerID: 4 ldap://163.106.38.92:9014/
olcSyncrepl: {0}rid=011 provider=ldap://163.106.38.90:9011/ binddn="cn=admin,c =fr" bindmethod=simple credentials=secret searchbase="c=fr" type=refreshAndPe rsist retry="5 5 300 12 3600 +" timeout=3 olcSyncrepl: {1}rid=012 provider=ldap://163.106.38.92:9012/ binddn="cn=admin,c =fr" bindmethod=simple credentials=secret searchbase="c=fr" type=refreshAndPe rsist retry="5 5 300 12 3600 +" timeout=3 olcSyncrepl: {2}rid=013 provider=ldap://163.106.38.90:9013/ binddn="cn=admin,c =fr" bindmethod=simple credentials=secret searchbase="c=fr" type=refreshAndPe rsist retry="5 5 300 12 3600 +" timeout=3 olcSyncrepl: {3}rid=014 provider=ldap://163.106.38.92:9014/ binddn="cn=admin,c =fr" bindmethod=simple credentials=secret searchbase="c=fr" type=refreshAndPe rsist retry="5 5 300 12 3600 +" timeout=3
I am starting with all instances synced and I am trying to add entries en all four instances (in //). If I do so, I have a few entries that are not replicated on the others. I am getting this kind of messages :
do_syncrep2: cookie=rid=011,sid=002,csn=20090227130003.849482Z#000000#004#000000 do_syncrep2: rid=011 CSN too old, ignoring 20090227130003.849482Z#000000#004#000000 do_syncrep2: cookie=rid=013,sid=002,csn=20090227130003.849482Z#000000#004#000000 do_syncrep2: rid=013 CSN too old, ignoring 20090227130003.849482Z#000000#004#000000 do_syncrep2: cookie=rid=014,sid=002,csn=20090227130003.946474Z#000000#004#000000
Did someone face the same issue ?
Here is my configuration : (I am using refreshAndPersist mode for both cn=config and olcDatabase={1}bdb)
M1 on IP1 / PORT1 : dn: cn=config objectClass: olcGlobal cn: config structuralObjectClass: olcGlobal creatorsName: cn=config olcServerID: 1 ldap://163.106.38.90:9011/ olcServerID: 2 ldap://163.106.38.92:9012/ olcServerID: 3 ldap://163.106.38.90:9013/ olcServerID: 4 ldap://163.106.38.92:9014/ entryUUID: ef89c876-adb3-4dc7-aa7d-024bbc359c98 createTimestamp: 20090227085748Z entryCSN: 20090227085749.920499Z#000000#004#000000 modifiersName: cn=config modifyTimestamp: 20090227085749Z contextCSN: 20090227085752.833630Z#000000#001#000000
dn: olcDatabase={1}bdb objectClass: olcDatabaseConfig objectClass: olcBdbConfig olcDatabase: {1}bdb olcDbDirectory: ./openldap-data olcSuffix: c=fr olcRootDN: cn=admin,c=fr olcRootPW:: e1NTSEF9WVZNSHJtYTRvUGd4KzFoak9kYWhBcm5NVHJxU1Zmdno= olcSizeLimit: 100 olcSyncrepl: {0}rid=011 provider=ldap://163.106.38.90:9011/ binddn="cn=admin,c =fr" bindmethod=simple credentials=secret searchbase="c=fr" type=refreshAndPe rsist retry="5 5 300 12 3600 +" timeout=3 olcSyncrepl: {1}rid=012 provider=ldap://163.106.38.92:9012/ binddn="cn=admin,c =fr" bindmethod=simple credentials=secret searchbase="c=fr" type=refreshAndPe rsist retry="5 5 300 12 3600 +" timeout=3 olcSyncrepl: {2}rid=013 provider=ldap://163.106.38.90:9013/ binddn="cn=admin,c =fr" bindmethod=simple credentials=secret searchbase="c=fr" type=refreshAndPe rsist retry="5 5 300 12 3600 +" timeout=3 olcSyncrepl: {3}rid=014 provider=ldap://163.106.38.92:9014/ binddn="cn=admin,c =fr" bindmethod=simple credentials=secret searchbase="c=fr" type=refreshAndPe rsist retry="5 5 300 12 3600 +" timeout=3 olcTimeLimit: 600 olcMirrorMode: TRUE olcDbCacheSize: 2000 olcDbCheckpoint: 2000 10 olcDbIndex: default pres,eq olcDbIndex: cn,sn pres,eq,sub olcDbIndex: objectClass,entryCSN,entryUUID eq structuralObjectClass: olcBdbConfig entryUUID: 00c01e5d-69ee-4baa-8e5a-4ef609dfd958 creatorsName: cn=config createTimestamp: 20090227085752Z entryCSN: 20090227085752.729899Z#000000#001#000000 modifiersName: cn=config modifyTimestamp: 20090227085752Z
M2 on IP2 / PORT2 : dn: cn=config objectClass: olcGlobal cn: config structuralObjectClass: olcGlobal entryUUID: 8da75037-65e6-4375-8c21-7e5c0194a60b creatorsName: cn=config createTimestamp: 20090227085723Z olcServerID: 1 ldap://163.106.38.90:9011/ olcServerID: 2 ldap://163.106.38.92:9012/ olcServerID: 3 ldap://163.106.38.90:9013/ olcServerID: 4 ldap://163.106.38.92:9014/ entryCSN: 20090227085725.003182Z#000000#002#000000 modifiersName: cn=config modifyTimestamp: 20090227085725Z contextCSN: 20090227085752.833630Z#000000#001#000000
dn: olcDatabase={1}bdb objectClass: olcDatabaseConfig objectClass: olcBdbConfig olcDatabase: {1}bdb olcDbDirectory: ./openldap-data olcSuffix: c=fr olcRootDN: cn=admin,c=fr olcRootPW:: e1NTSEF9WVZNSHJtYTRvUGd4KzFoak9kYWhBcm5NVHJxU1Zmdno= olcSizeLimit: 100 olcSyncrepl: {0}rid=011 provider=ldap://163.106.38.90:9011/ binddn="cn=admin,c =fr" bindmethod=simple credentials=secret searchbase="c=fr" type=refreshAndPe rsist retry="5 5 300 12 3600 +" timeout=3 olcSyncrepl: {1}rid=012 provider=ldap://163.106.38.92:9012/ binddn="cn=admin,c =fr" bindmethod=simple credentials=secret searchbase="c=fr" type=refreshAndPe rsist retry="5 5 300 12 3600 +" timeout=3 olcSyncrepl: {2}rid=013 provider=ldap://163.106.38.90:9013/ binddn="cn=admin,c =fr" bindmethod=simple credentials=secret searchbase="c=fr" type=refreshAndPe rsist retry="5 5 300 12 3600 +" timeout=3 olcSyncrepl: {3}rid=014 provider=ldap://163.106.38.92:9014/ binddn="cn=admin,c =fr" bindmethod=simple credentials=secret searchbase="c=fr" type=refreshAndPe rsist retry="5 5 300 12 3600 +" timeout=3 olcTimeLimit: 600 olcMirrorMode: TRUE olcDbCacheSize: 2000 olcDbCheckpoint: 2000 10 olcDbIndex: default pres,eq olcDbIndex: cn,sn pres,eq,sub olcDbIndex: objectClass,entryCSN,entryUUID eq structuralObjectClass: olcBdbConfig entryUUID: 00c01e5d-69ee-4baa-8e5a-4ef609dfd958 creatorsName: cn=config createTimestamp: 20090227085752Z entryCSN: 20090227085752.729899Z#000000#001#000000 modifiersName: cn=config modifyTimestamp: 20090227085752Z
M3 on IP1 / PORT3 : dn: cn=config objectClass: olcGlobal cn: config structuralObjectClass: olcGlobal entryUUID: cf068647-318f-4848-9c72-9c7745a8a4b3 creatorsName: cn=config createTimestamp: 20090227085742Z olcServerID: 1 ldap://163.106.38.90:9011/ olcServerID: 2 ldap://163.106.38.92:9012/ olcServerID: 3 ldap://163.106.38.90:9013/ olcServerID: 4 ldap://163.106.38.92:9014/ entryCSN: 20090227085743.825685Z#000000#003#000000 modifiersName: cn=config modifyTimestamp: 20090227085743Z contextCSN: 20090227085752.833630Z#000000#001#000000
dn: olcDatabase={1}bdb objectClass: olcDatabaseConfig objectClass: olcBdbConfig olcDatabase: {1}bdb olcDbDirectory: ./openldap-data olcSuffix: c=fr olcRootDN: cn=admin,c=fr olcRootPW:: e1NTSEF9WVZNSHJtYTRvUGd4KzFoak9kYWhBcm5NVHJxU1Zmdno= olcSizeLimit: 100 olcSyncrepl: {0}rid=011 provider=ldap://163.106.38.90:9011/ binddn="cn=admin,c =fr" bindmethod=simple credentials=secret searchbase="c=fr" type=refreshAndPe rsist retry="5 5 300 12 3600 +" timeout=3 olcSyncrepl: {1}rid=012 provider=ldap://163.106.38.92:9012/ binddn="cn=admin,c =fr" bindmethod=simple credentials=secret searchbase="c=fr" type=refreshAndPe rsist retry="5 5 300 12 3600 +" timeout=3 olcSyncrepl: {2}rid=013 provider=ldap://163.106.38.90:9013/ binddn="cn=admin,c =fr" bindmethod=simple credentials=secret searchbase="c=fr" type=refreshAndPe rsist retry="5 5 300 12 3600 +" timeout=3 olcSyncrepl: {3}rid=014 provider=ldap://163.106.38.92:9014/ binddn="cn=admin,c =fr" bindmethod=simple credentials=secret searchbase="c=fr" type=refreshAndPe rsist retry="5 5 300 12 3600 +" timeout=3 olcTimeLimit: 600 olcMirrorMode: TRUE olcDbCacheSize: 2000 olcDbCheckpoint: 2000 10 olcDbIndex: default pres,eq olcDbIndex: cn,sn pres,eq,sub olcDbIndex: objectClass,entryCSN,entryUUID eq structuralObjectClass: olcBdbConfig entryUUID: 00c01e5d-69ee-4baa-8e5a-4ef609dfd958 creatorsName: cn=config createTimestamp: 20090227085752Z entryCSN: 20090227085752.729899Z#000000#001#000000 modifiersName: cn=config modifyTimestamp: 20090227085752Z
M4 on IP2 / PORT4 : dn: cn=config objectClass: olcGlobal cn: config structuralObjectClass: olcGlobal entryUUID: ef89c876-adb3-4dc7-aa7d-024bbc359c98 creatorsName: cn=config createTimestamp: 20090227085748Z olcServerID: 1 ldap://163.106.38.90:9011/ olcServerID: 2 ldap://163.106.38.92:9012/ olcServerID: 3 ldap://163.106.38.90:9013/ olcServerID: 4 ldap://163.106.38.92:9014/ entryCSN: 20090227085749.920499Z#000000#004#000000 modifiersName: cn=config modifyTimestamp: 20090227085749Z contextCSN: 20090227085752.833630Z#000000#001#000000
dn: olcDatabase={1}bdb objectClass: olcDatabaseConfig objectClass: olcBdbConfig olcDatabase: {1}bdb olcDbDirectory: ./openldap-data olcSuffix: c=fr olcRootDN: cn=admin,c=fr olcRootPW:: e1NTSEF9WVZNSHJtYTRvUGd4KzFoak9kYWhBcm5NVHJxU1Zmdno= olcSizeLimit: 100 olcSyncrepl: {0}rid=011 provider=ldap://163.106.38.90:9011/ binddn="cn=admin,c =fr" bindmethod=simple credentials=secret searchbase="c=fr" type=refreshAndPe rsist retry="5 5 300 12 3600 +" timeout=3 olcSyncrepl: {1}rid=012 provider=ldap://163.106.38.92:9012/ binddn="cn=admin,c =fr" bindmethod=simple credentials=secret searchbase="c=fr" type=refreshAndPe rsist retry="5 5 300 12 3600 +" timeout=3 olcSyncrepl: {2}rid=013 provider=ldap://163.106.38.90:9013/ binddn="cn=admin,c =fr" bindmethod=simple credentials=secret searchbase="c=fr" type=refreshAndPe rsist retry="5 5 300 12 3600 +" timeout=3 olcSyncrepl: {3}rid=014 provider=ldap://163.106.38.92:9014/ binddn="cn=admin,c =fr" bindmethod=simple credentials=secret searchbase="c=fr" type=refreshAndPe rsist retry="5 5 300 12 3600 +" timeout=3 olcTimeLimit: 600 olcMirrorMode: TRUE olcDbCacheSize: 2000 olcDbCheckpoint: 2000 10 olcDbIndex: default pres,eq olcDbIndex: cn,sn pres,eq,sub olcDbIndex: objectClass,entryCSN,entryUUID eq structuralObjectClass: olcBdbConfig entryUUID: 00c01e5d-69ee-4baa-8e5a-4ef609dfd958 creatorsName: cn=config createTimestamp: 20090227085752Z entryCSN: 20090227085752.729899Z#000000#001#000000 modifiersName: cn=config modifyTimestamp: 20090227085752Z
I also have to say that if I stop M3 & M4 and try to add entries to M1 & M2, I don't get the "CSN too old" messages and all entries a replicated correctly !!!
Is this because Multimaster is limited to 2-3 instances ? The reason why I am trying to build a 4-way master architecture is because I would like to be able to stop one server and perform a slapcat, even if one "physical" server is down...
Thanks very much for your consideration & time..
Adrien Futschik
Just a wild guess: Do all your ldap server have correct time (NTP) ?
André
--On Friday, February 27, 2009 3:43 PM +0100 Adrien Futschik adrien.futschik@atosorigin.com wrote:
Hello everyone,
I also have to say that if I stop M3 & M4 and try to add entries to M1 & M2, I don't get the "CSN too old" messages and all entries a replicated correctly !!!
Is this because Multimaster is limited to 2-3 instances ? The reason why I am trying to build a 4-way master architecture is because I would like to be able to stop one server and perform a slapcat, even if one "physical" server is down...
N-Way means for any value of N. So 4 should work just fine. Are all 4 of your servers tightly synchronized, time-wise? I.e., using ntp to make sure their clocks all agree?
--Quanah
--
Quanah Gibson-Mount Principal Software Engineer Zimbra, Inc -------------------- Zimbra :: the leader in open source messaging and collaboration
Le vendredi 27 février 2009 18:37:33, Quanah Gibson-Mount a écrit :
--On Friday, February 27, 2009 3:43 PM +0100 Adrien Futschik
adrien.futschik@atosorigin.com wrote:
Hello everyone,
I also have to say that if I stop M3 & M4 and try to add entries to M1 & M2, I don't get the "CSN too old" messages and all entries a replicated correctly !!!
Is this because Multimaster is limited to 2-3 instances ? The reason why I am trying to build a 4-way master architecture is because I would like to be able to stop one server and perform a slapcat, even if one "physical" server is down...
N-Way means for any value of N. So 4 should work just fine. Are all 4 of your servers tightly synchronized, time-wise? I.e., using ntp to make sure their clocks all agree?
--Quanah
--
Quanah Gibson-Mount Principal Software Engineer Zimbra, Inc
Zimbra :: the leader in open source messaging and collaboration
Yes, they are synchronized with NTP !
Le vendredi 27 février 2009 18:37:33, Quanah Gibson-Mount a écrit :
--On Friday, February 27, 2009 3:43 PM +0100 Adrien Futschik
adrien.futschik@atosorigin.com wrote:
Hello everyone,
I also have to say that if I stop M3 & M4 and try to add entries to M1 & M2, I don't get the "CSN too old" messages and all entries a replicated correctly !!!
Is this because Multimaster is limited to 2-3 instances ? The reason why I am trying to build a 4-way master architecture is because I would like to be able to stop one server and perform a slapcat, even if one "physical" server is down...
N-Way means for any value of N. So 4 should work just fine. Are all 4 of your servers tightly synchronized, time-wise? I.e., using ntp to make sure their clocks all agree?
--Quanah
--
Quanah Gibson-Mount Principal Software Engineer Zimbra, Inc
Zimbra :: the leader in open source messaging and collaboration
Considering that M1 & M3 are on the same server and therefore have exactly the same time, if this was a time related problem, I should'nt get any "CSN too old" messages between M1&M3 and M2&M4, should I ?
I have also noticed that when M1 gets a new entry and passes it to M2&M3&M4, when M2&M3&M4 revieve it, they also pass it to M2&M3&M4 ! I don't understand why this happends but it look's very much like this is what's happening, because sometimes, M2 would have passed-it to M4, before M4 has actualy revieved the add order from M1.
I therefore happend to notice that sometimes, entries send from M1 are revieved in the wrong ordrer by other masters and therefore some entries may be skipped !!!
Here is a example : I add cn=M1client1 & cn=M1client2 on M1,
M1client1 & M1client2 are successuly replicated on M2&M4 but on M3, only M1client2 is inserted and I am getting an "CSN too old" message for M1client1 on M3.
I don't have the logfile here, I'll send extracts this monday. I am also getting this messages from time to time : => bdb_idl_insert_key: c_put id failed: DB_LOCK_DEADLOCK: Locker killed to resolve a deadlock (-30994) => bdb_dn2id_add 0x1e40: parent (ou=clients,o=edf,c=fr) insert failed: -30994
I guess this is because all 4 masters recieve entries that have the same parent : ou=clients,o=edf,c=fr and that happends if two entries are "inserted" simultaniously.
Regards
Adrien Futschik
Adrien Futschik wrote:
Considering that M1& M3 are on the same server and therefore have exactly the same time, if this was a time related problem, I should'nt get any "CSN too old" messages between M1&M3 and M2&M4, should I ?
I have also noticed that when M1 gets a new entry and passes it to M2&M3&M4, when M2&M3&M4 revieve it, they also pass it to M2&M3&M4 ! I don't understand why this happends but it look's very much like this is what's happening, because sometimes, M2 would have passed-it to M4, before M4 has actualy revieved the add order from M1.
I therefore happend to notice that sometimes, entries send from M1 are revieved in the wrong ordrer by other masters and therefore some entries may be skipped !!!
Yes, that makes sense. The CSN check assumes changes will always be received in the same order they were sent from the provider. Obviously in this case this assumption is wrong. You should submit an ITS for this.
This problem was discussed on the -devel list back in 2007; the code ought to be using a spanning tree/routing algorithm to ensure that when multiple routes exist for propagating a change, the change is delivered exactly once. Unfortunately no one has spent any further time on this issue since then.
Here is a example : I add cn=M1client1& cn=M1client2 on M1,
M1client1& M1client2 are successuly replicated on M2&M4 but on M3, only M1client2 is inserted and I am getting an "CSN too old" message for M1client1 on M3.
I don't have the logfile here, I'll send extracts this monday. I am also getting this messages from time to time : => bdb_idl_insert_key: c_put id failed: DB_LOCK_DEADLOCK: Locker killed to resolve a deadlock (-30994) => bdb_dn2id_add 0x1e40: parent (ou=clients,o=edf,c=fr) insert failed: -30994
I guess this is because all 4 masters recieve entries that have the same parent : ou=clients,o=edf,c=fr and that happends if two entries are "inserted" simultaniously.
DB_LOCK_DEADLOCK messages can always be ignored; back-bdb always retries when it hits a deadlock.
Howard Chu wrote:
Adrien Futschik wrote:
Considering that M1& M3 are on the same server and therefore have exactly the same time, if this was a time related problem, I should'nt get any "CSN too old" messages between M1&M3 and M2&M4, should I ?
I have also noticed that when M1 gets a new entry and passes it to M2&M3&M4, when M2&M3&M4 revieve it, they also pass it to M2&M3&M4 ! I don't understand why this happends but it look's very much like this is what's happening, because sometimes, M2 would have passed-it to M4, before M4 has actualy revieved the add order from M1.
I therefore happend to notice that sometimes, entries send from M1 are revieved in the wrong ordrer by other masters and therefore some entries may be skipped !!!
Yes, that makes sense. The CSN check assumes changes will always be received in the same order they were sent from the provider. Obviously in this case this assumption is wrong. You should submit an ITS for this.
This problem was discussed on the -devel list back in 2007; the code ought to be using a spanning tree/routing algorithm to ensure that when multiple routes exist for propagating a change, the change is delivered exactly once. Unfortunately no one has spent any further time on this issue since then.
But the CSN is supposed to guarantee that regardless of the order servers converge to the same status. In fact, if entries are received in different order but carry an entryCSN attribute that is newer, the newer should take place (and be propagated further through slapo-syncprov if MMR); if identical or older, should be ignored (and not propagated). If the modification that comes in implies something odd like a missing parent or so, glue entries should be created, to be replaced by the right entry as soon as it comes in. In MMR, assuming perfect symmetry, we could do something like ignoring entries that come from a provider with the entryCSN generated by another provider, under the assumption we will eventually get it from the right provider? Or better (and symmetrical) do not propagate entries whose CSN was not generated by ourselves, under the assumption the one that generated the CSN will propagate it?
p.
Ing. Pierangelo Masarati OpenLDAP Core Team
SysNet s.r.l. via Dossi, 8 - 27100 Pavia - ITALIA http://www.sys-net.it ----------------------------------- Office: +39 02 23998309 Mobile: +39 333 4963172 Fax: +39 0382 476497 Email: ando@sys-net.it -----------------------------------
Pierangelo Masarati wrote:
Howard Chu wrote:
Adrien Futschik wrote:
Considering that M1& M3 are on the same server and therefore have exactly the same time, if this was a time related problem, I should'nt get any "CSN too old" messages between M1&M3 and M2&M4, should I ?
I have also noticed that when M1 gets a new entry and passes it to M2&M3&M4, when M2&M3&M4 revieve it, they also pass it to M2&M3&M4 ! I don't understand why this happends but it look's very much like this is what's happening, because sometimes, M2 would have passed-it to M4, before M4 has actualy revieved the add order from M1.
I therefore happend to notice that sometimes, entries send from M1 are revieved in the wrong ordrer by other masters and therefore some entries may be skipped !!!
Yes, that makes sense. The CSN check assumes changes will always be received in the same order they were sent from the provider. Obviously in this case this assumption is wrong. You should submit an ITS for this.
This problem was discussed on the -devel list back in 2007; the code ought to be using a spanning tree/routing algorithm to ensure that when multiple routes exist for propagating a change, the change is delivered exactly once. Unfortunately no one has spent any further time on this issue since then.
But the CSN is supposed to guarantee that regardless of the order servers converge to the same status. In fact, if entries are received in different order but carry an entryCSN attribute that is newer, the newer should take place (and be propagated further through slapo-syncprov if MMR); if identical or older, should be ignored (and not propagated). If the modification that comes in implies something odd like a missing parent or so, glue entries should be created, to be replaced by the right entry as soon as it comes in.
Right, but you're talking about the entryCSN of a replicated entry, and not the CSN that was sent in the sync cookie. The two don't have to be the same, particularly if there are a lot of writes active on the provider.
When a consumer accepts an out of order cookie CSN then any other consumers cascaded off it will receive incomplete data. (The consumer claims to be up to date as of revision X, but it is in fact missing revision X-1.)
In MMR, assuming perfect symmetry, we could do something like ignoring entries that come from a provider with the entryCSN generated by another provider, under the assumption we will eventually get it from the right provider? Or better (and symmetrical) do not propagate entries whose CSN was not generated by ourselves, under the assumption the one that generated the CSN will propagate it?
That assumes a fully connected star topology. Such a layout won't scale; the intention is for this to work even with irregular topologies. E.g.
A - B G | | | \ C - D - E - F - H
Le lundi 02 mars 2009 12:18:23, Howard Chu a écrit :
Adrien Futschik wrote:
Considering that M1& M3 are on the same server and therefore have exactly the same time, if this was a time related problem, I should'nt get any "CSN too old" messages between M1&M3 and M2&M4, should I ?
I have also noticed that when M1 gets a new entry and passes it to M2&M3&M4, when M2&M3&M4 revieve it, they also pass it to M2&M3&M4 ! I don't understand why this happends but it look's very much like this is what's happening, because sometimes, M2 would have passed-it to M4, before M4 has actualy revieved the add order from M1.
I therefore happend to notice that sometimes, entries send from M1 are revieved in the wrong ordrer by other masters and therefore some entries may be skipped !!!
Yes, that makes sense. The CSN check assumes changes will always be received in the same order they were sent from the provider. Obviously in this case this assumption is wrong. You should submit an ITS for this.
This problem was discussed on the -devel list back in 2007; the code ought to be using a spanning tree/routing algorithm to ensure that when multiple routes exist for propagating a change, the change is delivered exactly once. Unfortunately no one has spent any further time on this issue since then.
I am not whethere it is M1 that sends them in the wrong order to M2 and then cascaded to M3&M4, or if it is the order of M2 queue's that's wrong. I guess this must be the second option.
I'll submit an ITS right away.
Personnaly I believe that the best way to avoid this problem to happen, would be not to propagate entries just recieved from an other master.
Adrien Futschik
openldap-technical@openldap.org