Clowser, Jeff (Contractor) wrote:
Agreed - having a single "active" master and a "hot"/active but unused standby master solves most HA issues without introducing the conflicts a full active-active multimaster setup creates. But if that master is there and accepts writes, it's inevitable that someone will some day write to it out of ignorance, and *may* write a conflicting change, so I see conflict resolution as a last ditch fallback for this situation (and nothing more) to prevent corruption or breakage of replication. (Plus, I like to close up or at least be fully aware of all the edge cases that exist, so I know how best to avoid them :) ).
Makes sense.
You said at one point that OpenLDAP (2.4.6?) currently does entry level conflict resolution, and does not do attribute level conflict resolution yet - i.e. if the entry was updated on 2 separate servers with different updates, conflicting or not, the most recently changed version of the *entry* wins. If I change the cn on one master, and after that (but before replication has occurred) I change the userpassword on another master, then the sync up occurs, I won't see the entry with the cn and password changed on all servers, I'll see the entry as it is in the master most recently changed (i.e. in my example, I'll see a changed password, but the cn will revert). Is there a roadmap/timeline for doing attribute level conflict resolution?
There are no set dates, but I expect it to be later in the 2.4 stream.
Also, I was looking at the admin guide and syncprov man pages on how to set up replication. N-Way multi-mastering details are kinda sparce :). Is there any documentation elsewhere on setting this up? OR... Is the setup exactly the same as setting up Mirror-mode (per 2.3.x), but the 2.4.x code just automatically does conflict resolution (i.e. was mirror-mode a 2.3 feature, with multimaster transparently replacing it in 2.4 by adding conflict resolution to mirror-mode, using the same setup?)
Yes, set it up pretty much like MirrorMode. MirrorMode was 2.4.1-2.4.4, which were only alpha releases, not general/public releases.
Is it possible for a consumer to replicate from multiple masters?
Yes in 2.4.
I'm thinking along the lines of a master server at 2 locations (for HA/DR purposes), plus each location also has multiple read-only slave consumers. My first thought is that these slave servers point to the local master, but if that master goes down, the slaves under that master stop getting updates. My second thought is to have a load balancer at each site, which directs all traffic connecting to a "master ldap" vip to route connections to the primary master if it's up, or the secondary master if the primary is unavailable. But... (I'm still absorbing syncrepl and rfc 4533) will all the contextCSNs and cookies and so forth match up well enough to allow this kind of failover for *syncrepl*? Is it possible, and what's the best way to set this up, such that I have multiple masters for DR purposes, and such that the failure of any single master does not cause some subset of my read-only slave consumers to stop getting updated?
Syncrepl (in refreshAndPersist mode), as I understand it, generally has the slave consumer contacting the master server, retrieving an updated list of changes since the last time it was running (refresh), then leaves a persistent search running that gets changed entries from the master server as they happen (persist), so replication is near real-time. If the master server crashes and then is restarted or the connection is broken/dropped (common if a load balancer is inbetween), how well does the consumer detect this and reconnect, or do the consumers tend to have to be restarted after this occurs? (This is a broken/dropped connection, *not* one cleanly closed by a master server clean shutdown or idle timeout, and many apps have trouble detecting this - the client still thinks it has a valid tcp connection, but nothing is coming over it, so never gets new updates. Does the consumer send keepalive packets or anything to cause it to realize the connection has died and to reconnect?)
Currently the consumer relies on TCP keepalives. We've discussed adding LDAP-level keepalives so we're not dependent on the kernel TCP timers, but that hasn't been done yet.
When initializing a consumer using an LDIF backup of the master, should this be a slapcat export to get everything needed to support syncrepl (such as contextCSN, entryUUIDS, etc)?
That's the fastest way. But you can just bring up a consumer with an empty database and let it pull the entire DB down during its refresh pass, it will work regardless. Unlike some other replication schemes you may have used, we don't require any special considerations for initial load vs reload or recovery. Turn it on and it works.