Adrien Futschik wrote:
Considering that M1& M3 are on the same server and therefore have exactly the same time, if this was a time related problem, I should'nt get any "CSN too old" messages between M1&M3 and M2&M4, should I ?
I have also noticed that when M1 gets a new entry and passes it to M2&M3&M4, when M2&M3&M4 revieve it, they also pass it to M2&M3&M4 ! I don't understand why this happends but it look's very much like this is what's happening, because sometimes, M2 would have passed-it to M4, before M4 has actualy revieved the add order from M1.
I therefore happend to notice that sometimes, entries send from M1 are revieved in the wrong ordrer by other masters and therefore some entries may be skipped !!!
Yes, that makes sense. The CSN check assumes changes will always be received in the same order they were sent from the provider. Obviously in this case this assumption is wrong. You should submit an ITS for this.
This problem was discussed on the -devel list back in 2007; the code ought to be using a spanning tree/routing algorithm to ensure that when multiple routes exist for propagating a change, the change is delivered exactly once. Unfortunately no one has spent any further time on this issue since then.
Here is a example : I add cn=M1client1& cn=M1client2 on M1,
M1client1& M1client2 are successuly replicated on M2&M4 but on M3, only M1client2 is inserted and I am getting an "CSN too old" message for M1client1 on M3.
I don't have the logfile here, I'll send extracts this monday. I am also getting this messages from time to time : => bdb_idl_insert_key: c_put id failed: DB_LOCK_DEADLOCK: Locker killed to resolve a deadlock (-30994) => bdb_dn2id_add 0x1e40: parent (ou=clients,o=edf,c=fr) insert failed: -30994
I guess this is because all 4 masters recieve entries that have the same parent : ou=clients,o=edf,c=fr and that happends if two entries are "inserted" simultaniously.
DB_LOCK_DEADLOCK messages can always be ignored; back-bdb always retries when it hits a deadlock.