Le 03/09/2010 17:18, Andrew Findlay a écrit :
On Fri, Sep 03, 2010 at 04:35:24PM +0200, Jonathan CLARKE wrote:
DB_LOCK_DEADLOCK errors are only a warning: retries should occur until the operation completes. Of course, if they can be avoided, best avoid!
Question: is this topology sensible? If it is expected to work I will gather some debug data for an ITS. If not, I will have to drop back to plan B...
This is an interesting configuration. I would not have proceeded like this but, as Marc Patermann suggested, I would set up a virtual address that points to the currently available master, and configure one syncrepl clause using this address (and all other LDAP clients, in fact). Could this approach work for you?
That is what I have done now, and it does work. I am still a little uncertain about it though: when the normal server fails and the DNS entry or routing changes to point to the hot standby, will this confuse the consumer slapd? We are effectively telling it that the second machine *is* the original one, yet it will have a different serverID and possibly different contents.
Actually, I just set up a few servers to test this out.
I don't have any problems using the 2 syncrepl statements side-by-side on the slave. When one master goes offline, replication continues from the other, etc.
Could your problem be due to an unrelated configuration problem? If not, in the hypothesis that the simultaneous-ness of changes is causing problems, maybe try the seqmod overlay? (random idea, I don't know this overlay very well)
My testing using one syncrepl statement for a single virtual address also works fine in general (replication picks up where it left off), except in one case: - The slave server has a newer CSN for one of the serverIDs than the master it's talking to. In this case, replication just fails with a "LDAP_RES_SEARCH_RESULT" message.
Of course, this case can only occur if a modification was made, let's say, on master1, and master2 didn't replicate it before master1 became unavailable, and then master2 was then promoted to use the virtual address (despite it not being up-to-date with master1). But still...
The reason for this error is that the syncprov overlay on master2 detects that one of the slave's CSNs is newer than it's own (the first in this case), and closes the persistent search (syncprov.c, label bailout:), even though the other CSN could be older, and thus syncprov could provide updates.
I'm not sure if this can be considered a bug, but I think so. However, what to do in this case, from syncprov's point of view, is unclear to me...
Using two syncrepl statements is certainly suboptimal, as all modifications will be replicated twice to all read-only servers. However, I don't see any reason why it shouldn't work, off the top of my head. Does slapd end up synchronizing everything?
Not sure - there were only 25000 entries but I gave up and stopped the consumer server after 30 minutes as it still had not synchronised.
Good point about the double replication, though if it had worked cleanly it would be OK in the (low modification) environment that I have. The advantage is that nothing else is needed to manage the failover / fail-back cases.
Makes sense, and seems like rather an attractive architecture.
Jonathan