Pierangelo Masarati wrote:
Francis Swasey wrote:
Howard,
Given that slurpd will not be included in 2.4, I'm seriously hoping (as one of the few people still using slurpd in production) that these changes to syncrepl will be perfected in 2.3 before we are forced into the syncrepl usage by 2.4.
I don't think slurpd will be wiped out of 2.4 yet. AFAIK, the only significant removal in 2.4 is back-ldbm.
On the other hand... If we can iron out the last remaining issues for syncrepl in 2.4, I see no reason to carry slurpd forward. (There's also an outstanding issue of turning the syncrepl consumer into an overlay, what happened to that patch?)
So continuing the discussion of what to do with syncrepl and multiple contexts...
1) the provider must be told about all of the sources of changes living within its context. possible sources are a) local changes b) changes received via syncrepl 2) every source of changes must have a unique sid. a) if it's a syncprov, then it's configured explicitly there b) if it's a syncrepl consumer pulling from elsewhere, it uses the remote server's sid. 3) the provider must aggregate all of the cookies for each of these change sources and send them to consumers pulling from it.
There's some interesting implications here.
The fact that subordinate/glue can be used to put multiple change sources under a single provider gets us half way to a real multi-master setup already. In this case, we know that changes are in distinct DIT areas so that there's no possibility of collision.
There's a desire to be able to configure multiple change sources for the same context though. E.g., mirrormode is defined to only work with two servers mirroring each other, it would be nice to be able to extend this to additional failover servers.
From half-multi-master we can go all the way to multi- if we add collision detection and conflict resolution. There's a pretty simple way to handle collision detection - we just need to pass the entry's old entryCSN along with the rest of the modification info. On the consumer we check and see if the oldEntryCSN matches the consumer entry's current entryCSN. If they match, there is no collision. If they don't match, we need to resolve the conflict. With the syncrepl protocol conflict resolution is pretty easy - just compare the entry's entryCSN to the received modification's CSN and take whichever is newer (last writer wins). Either we discard the received mod if it's too old, or we apply it as normal because it's newer. (There's another case too of course - the received mod's entryCSN is identical to the current CSN, meaning we already received this change via another route. We just discard the change then.) So we can have perfect collision detection, and pretty reliable conflict resolution, just by adding one more field to syncrepl. To me this is so easy we can't not do it.
Of course to be able to compare entryCSNs reliably we need high quality, high resolution timestamps, and all of the participating servers must have tightly synchronized clocks. This isn't such a troublesome requirement, you just need to run NTP on all of the servers.
Ideally all of the servers would be NTP peers of each other, that way you could also query the local NTP server for the degree of clock skew with each other peer. I'm not sure we need to go to this extreme, but it's worth considering. (I'm not sure how useful that information is. E.g., if a server goes down, and it had a lot of local changes on it that hadn't propagated yet, and they all have valid timestamps, but the server's clock is way off when it restarts, what can you tell about it?)