I suppose the *real* solution is to use the multi-mastering
capability
in 2.4 to keep it in sync, but use it as if it's mirror mode (i.e.
all
writes to a single master, with the second as a hot standby), with
the
MM conflict resolution kicking in if needed because someone wrote to
the
hot standby when they shouldn't have.
That's our preferred/recommended usage. As I read somewhere else
recently,
"the best solution is not to have problems." Conflict resolution is
messy;
it's best to avoid it...
Agreed - having a single "active" master and a "hot"/active but unused standby master solves most HA issues without introducing the conflicts a full active-active multimaster setup creates. But if that master is there and accepts writes, it's inevitable that someone will some day write to it out of ignorance, and *may* write a conflicting change, so I see conflict resolution as a last ditch fallback for this situation (and nothing more) to prevent corruption or breakage of replication. (Plus, I like to close up or at least be fully aware of all the edge cases that exist, so I know how best to avoid them :) ).
You said at one point that OpenLDAP (2.4.6?) currently does entry level conflict resolution, and does not do attribute level conflict resolution yet - i.e. if the entry was updated on 2 separate servers with different updates, conflicting or not, the most recently changed version of the *entry* wins. If I change the cn on one master, and after that (but before replication has occurred) I change the userpassword on another master, then the sync up occurs, I won't see the entry with the cn and password changed on all servers, I'll see the entry as it is in the master most recently changed (i.e. in my example, I'll see a changed password, but the cn will revert). Is there a roadmap/timeline for doing attribute level conflict resolution?
Also, I was looking at the admin guide and syncprov man pages on how to set up replication. N-Way multi-mastering details are kinda sparce :). Is there any documentation elsewhere on setting this up? OR... Is the setup exactly the same as setting up Mirror-mode (per 2.3.x), but the 2.4.x code just automatically does conflict resolution (i.e. was mirror-mode a 2.3 feature, with multimaster transparently replacing it in 2.4 by adding conflict resolution to mirror-mode, using the same setup?)
Is it possible for a consumer to replicate from multiple masters? I'm thinking along the lines of a master server at 2 locations (for HA/DR purposes), plus each location also has multiple read-only slave consumers. My first thought is that these slave servers point to the local master, but if that master goes down, the slaves under that master stop getting updates. My second thought is to have a load balancer at each site, which directs all traffic connecting to a "master ldap" vip to route connections to the primary master if it's up, or the secondary master if the primary is unavailable. But... (I'm still absorbing syncrepl and rfc 4533) will all the contextCSNs and cookies and so forth match up well enough to allow this kind of failover for *syncrepl*? Is it possible, and what's the best way to set this up, such that I have multiple masters for DR purposes, and such that the failure of any single master does not cause some subset of my read-only slave consumers to stop getting updated?
Syncrepl (in refreshAndPersist mode), as I understand it, generally has the slave consumer contacting the master server, retrieving an updated list of changes since the last time it was running (refresh), then leaves a persistent search running that gets changed entries from the master server as they happen (persist), so replication is near real-time. If the master server crashes and then is restarted or the connection is broken/dropped (common if a load balancer is inbetween), how well does the consumer detect this and reconnect, or do the consumers tend to have to be restarted after this occurs? (This is a broken/dropped connection, *not* one cleanly closed by a master server clean shutdown or idle timeout, and many apps have trouble detecting this - the client still thinks it has a valid tcp connection, but nothing is coming over it, so never gets new updates. Does the consumer send keepalive packets or anything to cause it to realize the connection has died and to reconnect?)
When initializing a consumer using an LDIF backup of the master, should this be a slapcat export to get everything needed to support syncrepl (such as contextCSN, entryUUIDS, etc)?
Thanks, - Jeff