Liam Gretton wrote:
On 23/08/2012 11:18, Howard Chu wrote:
Your description of your procedure is so vague and imprecise it's difficult for anybody to decipher what you're talking about.
Reading back thru the several posts in this thread, what I see you saying is that you have tested a few different configurations:
target host is up, target LDAP server is down
this should fail immediately because the host OS will immediately send a
TCP Connection Refused response
target host is initially down
this will not fail until the first TCP connect request times out
target host is initially up and connected, but thru your
iptables manipulation you sever the link
this will not fail until the TCP connection times out, which it won't
unless you're using TCP Keepalives, and by default those are only sent once every 2 hours.
Let me make it less vague then.
What I've been trying to simulate are the various modes by which a uri target will become unavailable. What I'm trying to achieve is to have the meta backend point to four domain controllers and cope with one or more DCs being unavailable.
Having gone through this and let the system time out each time, I've found it does fail over under one of the conditions listed below, but it takes about 15 minutes to do so.
Scenarios:
slapd starts, first target is unreachable;
slapd starts, first target is reachable but has no service
running;
- slapd already running, first target up and connected then later
becomes unreachable.
Simulations:
a. 'Unreachable' simulated by blocking outbound access with the following iptables rule:
iptables -A OUTPUT -d host1 -j DROP
b. 'Unreachable' simulated making the first target a host that is up but with no service running.
Results (all with 2.4.32):
Case 1a: slapd retries host1 continuously and times out after about 180s. No attempt is made to contact additional targets.
Case 2b: slapd retries host1 continuously and times out after about 180s. No attempt is made to contact additional targets.
Case 3a: slapd retries host1 continuously, doubling an internal timeout value each time, eventually timing out after 19 retries and about 15m. It does then fall through to host2 and subsequent connections don't attempt to contact host1.
Here's my config. I've also tried setting nretries explicitly to 3, but it makes no difference.
database meta suffix dc=local rootdn cn=administrator,dc=local rootpw secret
network-timeout 1
uri ldap://host1:3268/ou=dc1,dc=local ldap://host2:3268/ ldap://host3:3268/
suffixmassage "ou=dc1,dc=local" "dc=example,dc=com"
idassert-bind bindmethod=simple binddn="cn=proxyuser,dc=example,dc=com" credentials="password"
idassert-authzfrom "dn.exact:cn=administrator,dc=local"
These results suggest to me that network-timeout and nretries (which should default to 3) don't work as documented.
I am really not astonished about your results. Run your tests again, but use "reject" as iptables target.
"drop" means, that you never ever get an answer.
Having said that, it does seem to at least cope with scenario 3, albeit with a long timeout.
Ideally it'd work in all cases. Pierangelo says the failover works when connect() times out, but I'd have thought that would include scenarios 1 and 2 but not 3.