Quanah Gibson-Mount wrote:
A number of our clients have requested "fail-over"/redundancy capabilities for the LDAP master, and as I'm currently working on moving our product to use OpenLDAP 2.4, this becomes a distinct possibility. However, I have some questions about the viability/reliability/effectiveness of using multiple masters combined with replicas. I don't see these answered in the Admin Guide.
You mean, setting up regular read-only replicas slaved to the masters?
I'll start with replication under MMR.
As I understand it, the replicas can only point at a single master.
False.
So, if I have a 2 master MMR setup, I assume I would want to point half my replicas at master A and the other half at master B for their updates. This leads to a problem in my mind, in that if master A goes down, then half of my replica pool is now going to remain completely out of sync with the remaining master until master A is recovered. Throwing a Load balancer in front of the two masters, and pointing the replicas at that instead, is not a viable option because the two masters may be getting updates in a different sequence, so if a replica disconnects from the LB and then reconnects, the updates it could get fed from whatever master the LB is pointing at could lead to inconsistencies.
What inconsistencies? Each master's changes are stamped with its own sid. Any consumer is going to know about the contextCSNs of each master it talks to.
Neither of these seem like a good option. I don't see a good solution here to resolve this issue, either, unless the replica could somehow know which master it had been talking to,
The replica always knows which master it's talking to...
and drop into refresh mode if it found itself talking to a new master?
Drop into refresh mode? Obviously in persist mode the consumer keeps a connection open to a specific master; a load balancer can't move an open connection. So obviously, if a particular master disappears, all of its clients are going to lose their connections and any consumers set up to retry are going to have to initiate new sessions. And every new replication session starts with a refresh phase. So this recovery is already automatic, it always has been.
I'm also not clear on what happens if your replicas are delta-syncrepl based, rather than normal syncrepl, in the LB setup.
Not possible. Current delta-sync requires all updates to be logged in order; in an MMR setup you can't guarantee order so *nobody* can use delta-sync in this scenario.
For Mirror Mode, I would assume you could point the replicas at the LB fronting the two masters, since only one master is ever receiving changes. I also assume delta-syncrepl would be a completely valid option for replication to the replicas, again because only one master is getting the updates, so all updates would be logged in the same sequence on both servers. However, I don't know if this is correct or not, or if there are limitations here I haven't considered. When I was first pondering this on the #openldap-devel channel in IRC, Matt Backes made a comment about delta-syncrepl not working with Mirror Mode.
For MirrorMode, delta-sync should work since there is only ever one source of changes, and they will be logged in order. There is a window of vulnerability where a server crashes after committing changes to its accesslog, before it replicates them to the mirror. Those changes will be temporarily lost, and create a gap in the mirror's log. When the original server comes back up, the mirror will receive those lost changes, but the strict ordering of its log will be broken. In this case though, the delta-sync consumer will be fine - if the lost changes caused no conflicts, they will simply be committed. If they do cause a conflict, the consumer will just fallback to refresh mode and the conflicts will be erased.
So, basically, I'm at a loss if my understanding things is correct, on how I provide a consistent replicated environment for my customers, while also providing master/master failover.
This appears to have been a -software question, not a -devel question. Perhaps you should summarize back to the -software list and end this thread here.