ITS#4618, 4623, 4626 and 4703 all basically have to do with trying to use multiple replication contexts with a single provider. This is a behavior that the 2.3 syncprov implementation just wasn't designed for; it was meant to handle only a single context.
Looking at the ideas in 2.2's syncrepl, it might have gone in the direction of solving these problems if it weren't weighed down by so many insurmountable design and implementation flaws. 2.2 probably tried to do too much too soon, and got waylaid by the devil in the details.
At this point, this solution for multiple contexts presents itself: 1) We assign distinct searchbases to each context. 2) Every distinct source of changes must have its own unique rid. E.g., if a database is a provider for a context, it must have an rid. Every consumer within its namingContext must have their own rid's just as before. (The new requirement here is assigning rid's to providers that are masters of their data.) 3) Currently the provider hands a consumer a cookie consisting of the rid that the consumer supplied, plus a single contextCSN from the provider. This single contextCSN is inadequate for accurately capturing all of the changes that may come from multiple sources in a namingContext. Instead, the provider will send out a cookie consisting of multiple rid,CSN pairs - one for every rid of the provider's that resides in the consumer's search space. This is the only reliable way to make sure that all changes are tracked and propagated.
This says that in general, rids should not need to be configured on consumers - they should be dictated solely by the providers. It may be a good idea to allow them to be configured on consumers as an override, but for now that seems unimportant.
So: 1) the provider must have its own unique rid configured 2) the consumer's rid is optional 3) the provider must be told about all of the consumers living under it 4) the provider must aggregate all of the consumer cookies under it with its own context info when generating a cookie for its own consumers
Currently slapd treats an entire database as read-only when it has a consumer configured on it. This raises the question of how to allow multiple consumers in a single context - should we allow multiple consumers per DB, as 2.2 tried (and failed) to do, or should we continue with the current approach of one consumer per DB, and use glue to collect multiple consumers under one roof?
Howard,
Given that slurpd will not be included in 2.4, I'm seriously hoping (as one of the few people still using slurpd in production) that these changes to syncrepl will be perfected in 2.3 before we are forced into the syncrepl usage by 2.4.
Is that possible?
Francis Swasey wrote:
Howard,
Given that slurpd will not be included in 2.4, I'm seriously hoping (as one of the few people still using slurpd in production) that these changes to syncrepl will be perfected in 2.3 before we are forced into the syncrepl usage by 2.4.
I don't think slurpd will be wiped out of 2.4 yet. AFAIK, the only significant removal in 2.4 is back-ldbm.
p.
Ing. Pierangelo Masarati OpenLDAP Core Team
SysNet s.n.c. Via Dossi, 8 - 27100 Pavia - ITALIA http://www.sys-net.it ------------------------------------------ Office: +39.02.23998309 Mobile: +39.333.4963172 Email: pierangelo.masarati@sys-net.it ------------------------------------------
Pierangelo Masarati wrote:
Francis Swasey wrote:
Howard,
Given that slurpd will not be included in 2.4, I'm seriously hoping (as one of the few people still using slurpd in production) that these changes to syncrepl will be perfected in 2.3 before we are forced into the syncrepl usage by 2.4.
I don't think slurpd will be wiped out of 2.4 yet. AFAIK, the only significant removal in 2.4 is back-ldbm.
Well, I'd like to drop it, especially since it's not dynamically configurable. But right now may be a bit premature.
Pierangelo Masarati wrote:
Francis Swasey wrote:
Howard,
Given that slurpd will not be included in 2.4, I'm seriously hoping (as one of the few people still using slurpd in production) that these changes to syncrepl will be perfected in 2.3 before we are forced into the syncrepl usage by 2.4.
I don't think slurpd will be wiped out of 2.4 yet. AFAIK, the only significant removal in 2.4 is back-ldbm.
On the other hand... If we can iron out the last remaining issues for syncrepl in 2.4, I see no reason to carry slurpd forward. (There's also an outstanding issue of turning the syncrepl consumer into an overlay, what happened to that patch?)
So continuing the discussion of what to do with syncrepl and multiple contexts...
1) the provider must be told about all of the sources of changes living within its context. possible sources are a) local changes b) changes received via syncrepl 2) every source of changes must have a unique sid. a) if it's a syncprov, then it's configured explicitly there b) if it's a syncrepl consumer pulling from elsewhere, it uses the remote server's sid. 3) the provider must aggregate all of the cookies for each of these change sources and send them to consumers pulling from it.
There's some interesting implications here.
The fact that subordinate/glue can be used to put multiple change sources under a single provider gets us half way to a real multi-master setup already. In this case, we know that changes are in distinct DIT areas so that there's no possibility of collision.
There's a desire to be able to configure multiple change sources for the same context though. E.g., mirrormode is defined to only work with two servers mirroring each other, it would be nice to be able to extend this to additional failover servers.
From half-multi-master we can go all the way to multi- if we add collision detection and conflict resolution. There's a pretty simple way to handle collision detection - we just need to pass the entry's old entryCSN along with the rest of the modification info. On the consumer we check and see if the oldEntryCSN matches the consumer entry's current entryCSN. If they match, there is no collision. If they don't match, we need to resolve the conflict. With the syncrepl protocol conflict resolution is pretty easy - just compare the entry's entryCSN to the received modification's CSN and take whichever is newer (last writer wins). Either we discard the received mod if it's too old, or we apply it as normal because it's newer. (There's another case too of course - the received mod's entryCSN is identical to the current CSN, meaning we already received this change via another route. We just discard the change then.) So we can have perfect collision detection, and pretty reliable conflict resolution, just by adding one more field to syncrepl. To me this is so easy we can't not do it.
Of course to be able to compare entryCSNs reliably we need high quality, high resolution timestamps, and all of the participating servers must have tightly synchronized clocks. This isn't such a troublesome requirement, you just need to run NTP on all of the servers.
Ideally all of the servers would be NTP peers of each other, that way you could also query the local NTP server for the degree of clock skew with each other peer. I'm not sure we need to go to this extreme, but it's worth considering. (I'm not sure how useful that information is. E.g., if a server goes down, and it had a lot of local changes on it that hadn't propagated yet, and they all have valid timestamps, but the server's clock is way off when it restarts, what can you tell about it?)
Howard Chu wrote:
(There's also an outstanding issue of turning the syncrepl consumer into an overlay, what happened to that patch?)
That was overcomplicated by the need to deal with consumer-side stuff that was shared with replog, and by the need to move shadow knowledge ahead to allow the frontend to send referrals. The later could be moved to the backends by a helper, to solve this issue (at the cost of extra useless modification sanitization when the database is shadow).
It should be rewritten (todo).
p.
Howard Chu wrote:
So continuing the discussion of what to do with syncrepl and multiple contexts...
- the provider must be told about all of the sources of changes living
within its context. possible sources are a) local changes b) changes received via syncrepl 2) every source of changes must have a unique sid. a) if it's a syncprov, then it's configured explicitly there b) if it's a syncrepl consumer pulling from elsewhere, it uses the remote server's sid.
The olcServerID config attribute has been added for configuring these IDs. It is a global config keyword, not associated with a particular provider. A single serverID can be configured, for simple static setups. Or you can configure a list of serverIDs and corresponding URLs, to allow a single configuration to be replicated across a pool of servers.
- the provider must aggregate all of the cookies for each of these
change sources and send them to consumers pulling from it.
The consumer now checks to see if it's a subordinate DB; if so it will perform its contextCSN updates through the parent DB. If a syncprov overlay is present it will get a chance to see the contextCSN update.
There's a desire to be able to configure multiple change sources for the same context though. E.g., mirrormode is defined to only work with two servers mirroring each other, it would be nice to be able to extend this to additional failover servers.
I've modified the consumer to allow multiple syncrepl configurations on the same backend. Corresponding changes are still needed in the provider. The contextCSN attribute is now multi-valued, allowing a CSN per SID to be tracked. Modifies to the contextCSN must be done with specific Delete/Add instead of Replace.
There's no restriction on how this gets used - a consumer can talk to multiple providers that master disjoint subtrees of the context, or they can overlap partially or fully. As long as each provider has a unique SID their multiple contextCSNs will be tracked properly.
The SID is used in the "replica ID" field of the CSN. That was previously a two-digit hex number and it was always zero; I've increased it to three digits. That's probably excessive; two was probably plenty.
From half-multi-master we can go all the way to multi- if we add collision detection and conflict resolution. There's a pretty simple way to handle collision detection - we just need to pass the entry's old entryCSN along with the rest of the modification info. On the consumer we check and see if the oldEntryCSN matches the consumer entry's current entryCSN. If they match, there is no collision. If they don't match, we need to resolve the conflict.
Aside from allowing us to log that a conflict occurred, keeping the oldCSN around doesn't seem to buy us much. Since the conflict resolution is still determined solely by the current entryCSN, I'm dropping this idea. All we need to check is if the incoming mod's entryCSN is <= the current entryCSN and drop the change if so.
Of course to be able to compare entryCSNs reliably we need high quality, high resolution timestamps, and all of the participating servers must have tightly synchronized clocks. This isn't such a troublesome requirement, you just need to run NTP on all of the servers.
The CSN timestamps are now recorded with microsecond resolution. Whether the underlying system actually delivers such precision is anybody's guess. At least in my tests the microseconds returned by gettimeofday() were always unique, when run in a tight loop. (I recall many years ago when this was not true, and the value only changed down to milliseconds...)
Since Windows system time only runs with 10 millisecond resolution, I had to augment that with a high resolution timer. On my test machine this means the ACPI power management timer, which runs at about 3.58MHz, so that's certainly good enough. (Which hardware timer is used depends on the version of Windows and varies quite a bit.) However the high-res timer and the system timer run independently, so there's no guarantee that they both will zero out together when the next whole second ticks. I've kludged it up such that the error will be no more than 1 millisecond, but it's still an annoyance. This must be why AD uses integer update counters instead of timestamps; the OS doesn't provide a real source of high quality timestamps. (Oddly enough they still implement NTP with 0.1 microsecond resolution; they just seem to be discarding the extra precision.)
It shouldn't be a major problem, we still use the op counter if the resolution is too low and multiple updates occur in the same timeslice.
Not all of these changes are checked in yet, but they'll be coming in soon.
Still talking to myself...
Howard Chu wrote:
So continuing the discussion of what to do with syncrepl and multiple contexts...
- the provider must be told about all of the sources of changes
living within its context. possible sources are a) local changes b) changes received via syncrepl 2) every source of changes must have a unique sid. a) if it's a syncprov, then it's configured explicitly there b) if it's a syncrepl consumer pulling from elsewhere, it uses the remote server's sid.
The olcServerID config attribute has been added for configuring these IDs. It is a global config keyword, not associated with a particular provider. A single serverID can be configured, for simple static setups. Or you can configure a list of serverIDs and corresponding URLs, to allow a single configuration to be replicated across a pool of servers.
When a consumer is configured for multimaster (mirrormode; looks like we'll have to make "multimaster" a synonym) the serverID will also be sent explicitly in the sync cookie. By default (single master) it is not sent on its own. When the serverID is sent, the provider will use it to additionally filter updates. I.e., changes with an entryCSN whose SID matches the consumer's SID are assumed to have originated at the consumer and won't be sent back again. This avoids one wasted network transaction. (The consumer would just ignore the update anyway.)
- the provider must aggregate all of the cookies for each of these
change sources and send them to consumers pulling from it.
The consumer now checks to see if it's a subordinate DB; if so it will perform its contextCSN updates through the parent DB. If a syncprov overlay is present it will get a chance to see the contextCSN update.
This behavior can be checked using test033, by enabling glue and syncprov on the superior DB and leaving syncprov commented out on the other DBs.
There's a desire to be able to configure multiple change sources for the same context though. E.g., mirrormode is defined to only work with two servers mirroring each other, it would be nice to be able to extend this to additional failover servers.
I had in mind to extend the slap_bindconf structure to accomodate a list of URLs. If the first connection failed, each would be immediately tried in turn, before returning a failure to the caller. But for the moment I'm leaving that alone.
I've modified the consumer to allow multiple syncrepl configurations on the same backend. Corresponding changes are still needed in the provider. The contextCSN attribute is now multi-valued, allowing a CSN per SID to be tracked. Modifies to the contextCSN must be done with specific Delete/Add instead of Replace.
Both the consumer and provider have been updated accordingly.
There's no restriction on how this gets used - a consumer can talk to multiple providers that master disjoint subtrees of the context, or they can overlap partially or fully. As long as each provider has a unique SID their multiple contextCSNs will be tracked properly.
We still need more tests for the various scenarios...
From half-multi-master we can go all the way to multi- if we add collision detection and conflict resolution. There's a pretty simple way to handle collision detection - we just need to pass the entry's old entryCSN along with the rest of the modification info. On the consumer we check and see if the oldEntryCSN matches the consumer entry's current entryCSN. If they match, there is no collision. If they don't match, we need to resolve the conflict.
Aside from allowing us to log that a conflict occurred, keeping the oldCSN around doesn't seem to buy us much. Since the conflict resolution is still determined solely by the current entryCSN, I'm dropping this idea. All we need to check is if the incoming mod's entryCSN is <= the current entryCSN and drop the change if so.
The oldCSN would still be needed to support multimaster with delta-syncrepl. That would also allow us to do change-level conflict resolution with delta-syncrepl instead of just entry-level as the current code does. But at this point I'm not interested in adding that support.
It shouldn't be a major problem, we still use the op counter if the resolution is too low and multiple updates occur in the same timeslice.
The op counter is also important because it's possible for the system clock to run backwards in certain situations...
Not all of these changes are checked in yet, but they'll be coming in soon.
The code is in place; we just need some more test configurations...
So - no one can say OpenLDAP doesn't have multimaster replication any more.
Howard Chu wrote:
Aside from allowing us to log that a conflict occurred, keeping the oldCSN around doesn't seem to buy us much. Since the conflict resolution is still determined solely by the current entryCSN, I'm dropping this idea. All we need to check is if the incoming mod's entryCSN is <= the current entryCSN and drop the change if so.
The oldCSN would still be needed to support multimaster with delta-syncrepl. That would also allow us to do change-level conflict resolution with delta-syncrepl instead of just entry-level as the current code does. But at this point I'm not interested in adding that support.
This is probably going to come back and haunt us.
Entry-level conflict resolution means if multiple changes are made to the same entry on different servers at about the same time, only the change with the newest timestamp will be saved, and the other changes will be lost.
To get finer granularity, delta-syncrepl and the oldCSN would be required. When an incoming change's oldCSN doesn't match the current entryCSN we would have to look back in the accesslog for the record matching the oldCSN, and collect all of the changes made to the entry since that point. Walking forward from that point, if we see that none of the previous changes affect the same attributes as the incoming change, we can just accept the change and there's no conflict.
If any of the attributes are the same, then we just do last-writer-wins on each attribute: For single-valued attributes just use the last known value. If the last writer did a delete-whole-attribute mod, that's clear. Otherwise, multi-valued attrs with a mix of deletes and adds, just play them in sequence, with permissive semantics - ignore deletes on values that were already deleted, ignore adds on values that are already present.