--On Wednesday, March 18, 2009 3:01 AM -0700 Howard Chu <hyc(a)symas.com>
wrote:
Quanah Gibson-Mount wrote:
> A number of our clients have requested "fail-over"/redundancy
> capabilities for the LDAP master, and as I'm currently working on moving
> our product to use OpenLDAP 2.4, this becomes a distinct possibility.
> However, I have some questions about the
> viability/reliability/effectiveness of using multiple masters combined
> with replicas. I don't see these answered in the Admin Guide.
You mean, setting up regular read-only replicas slaved to the masters?
Yes.
> I'll start with replication under MMR.
> As I understand it, the replicas can only point at a single master.
False.
Last I checked, the provider parameter in the syncrepl configuration block
was single valued. Which means either it gets pointed at a load balancer,
or it only talks to a single master. If that master goes down, it no
longer talks to anything without reconfiguration.
> So, if
> I have a 2 master MMR setup, I assume I would want to point half my
> replicas at master A and the other half at master B for their updates.
> This leads to a problem in my mind, in that if master A goes down, then
> half of my replica pool is now going to remain completely out of sync
> with the remaining master until master A is recovered. Throwing a Load
> balancer in front of the two masters, and pointing the replicas at that
> instead, is not a viable option because the two masters may be getting
> updates in a different sequence, so if a replica disconnects from the LB
> and then reconnects, the updates it could get fed from whatever master
> the LB is pointing at could lead to inconsistencies.
What inconsistencies? Each master's changes are stamped with its own sid.
Any consumer is going to know about the contextCSNs of each master it
talks to.
The consumer can only talk to a single master, unless it is pointed at a
load balancer. See notes above. In the load balancer case then, I'm
assuming based on what you say, that if the replica gets disconnected
because master A went down, then when it gets reconnected to master B, that
its CSN will be completely different because of the SID values being
different, and do a full refresh. Is that correct?
> Neither of these seem like a
> good option. I don't see a good solution here to resolve this issue,
> either, unless the replica could somehow know which master it had been
> talking to,
The replica always knows which master it's talking to...
> and drop into refresh mode if it found itself talking to a new
> master?
Drop into refresh mode? Obviously in persist mode the consumer keeps a
connection open to a specific master; a load balancer can't move an open
connection. So obviously, if a particular master disappears, all of its
clients are going to lose their connections and any consumers set up to
retry are going to have to initiate new sessions. And every new
replication session starts with a refresh phase. So this recovery is
already automatic, it always has been.
Based off of the CSN value. See question above about CSN & SID.
> I'm also not clear on what happens if your replicas are
> delta-syncrepl based, rather than normal syncrepl, in the LB setup.
Not possible. Current delta-sync requires all updates to be logged in
order; in an MMR setup you can't guarantee order so *nobody* can use
delta-sync in this scenario.
> For Mirror Mode, I would assume you could point the replicas at the LB
> fronting the two masters, since only one master is ever receiving
> changes. I also assume delta-syncrepl would be a completely valid option
> for replication to the replicas, again because only one master is
> getting the updates, so all updates would be logged in the same sequence
> on both servers. However, I don't know if this is correct or not, or if
> there are limitations here I haven't considered. When I was first
> pondering this on the #openldap-devel channel in IRC, Matt Backes made a
> comment about delta-syncrepl not working with Mirror Mode.
For MirrorMode, delta-sync should work since there is only ever one
source of changes, and they will be logged in order. There is a window of
vulnerability where a server crashes after committing changes to its
accesslog, before it replicates them to the mirror. Those changes will be
temporarily lost, and create a gap in the mirror's log. When the original
server comes back up, the mirror will receive those lost changes, but the
strict ordering of its log will be broken. In this case though, the
delta-sync consumer will be fine - if the lost changes caused no
conflicts, they will simply be committed. If they do cause a conflict,
the consumer will just fallback to refresh mode and the conflicts will be
erased.
Ok, good. Mirror Mode sounds like the way to go.
> So, basically, I'm at a loss if my understanding things is
correct, on
> how I provide a consistent replicated environment for my customers,
> while also providing master/master failover.
This appears to have been a -software question, not a -devel question.
Perhaps you should summarize back to the -software list and end this
thread here.
Done.
--
Quanah Gibson-Mount
Principal Software Engineer
Zimbra, Inc
--------------------
Zimbra :: the leader in open source messaging and collaboration