On 22/08/2012 22:14, Pierangelo Masarati wrote:
But what's the point of specifying multiple targets in the uri option if it doesn't fall through to subsequent ones when the first is not contactable?
Have I completely missed the point of the documentation?
The point is that your condition is *not* a server unreachable.
There's obviously some subtlety I'm missing here. How would you describe it instead?
Current failover only deals with failures/timeouts of connect(2). I don't think handling your case using failover is appropriate. Your case should be handled by removing the non-responding URI from the list.
I don't understand the difference. If a server is unavailable for whatever reason (offline, firewalled, switched off, nothing listening on the specified port), then connect() will timeout as you describe.
Which failures are the current mechanism actually expected to cope with that don't include a server being unreachable?
On 08/23/2012 11:00 AM, Liam Gretton wrote:
On 22/08/2012 22:14, Pierangelo Masarati wrote:
But what's the point of specifying multiple targets in the uri option if it doesn't fall through to subsequent ones when the first is not contactable?
Have I completely missed the point of the documentation?
The point is that your condition is *not* a server unreachable.
There's obviously some subtlety I'm missing here. How would you describe it instead?
Current failover only deals with failures/timeouts of connect(2). I don't think handling your case using failover is appropriate. Your case should be handled by removing the non-responding URI from the list.
I don't understand the difference. If a server is unavailable for whatever reason (offline, firewalled, switched off, nothing listening on the specified port), then connect() will timeout as you describe.
When connect(2) times out the code behaves as expected.
p.
Which failures are the current mechanism actually expected to cope with that don't include a server being unreachable?
On 23/08/2012 10:22, Pierangelo Masarati wrote:
Current failover only deals with failures/timeouts of connect(2). I don't think handling your case using failover is appropriate. Your case should be handled by removing the non-responding URI from the list.
I don't understand the difference. If a server is unavailable for whatever reason (offline, firewalled, switched off, nothing listening on the specified port), then connect() will timeout as you describe.
When connect(2) times out the code behaves as expected.
Can you explain further please? 'As expected' to you is obviously different to what I expect from the documentation and what you've said previously. You say that the failover mechanism works when connect() fails or times out, but that's not the behaviour I'm seeing.
Liam Gretton wrote:
On 23/08/2012 10:22, Pierangelo Masarati wrote:
Current failover only deals with failures/timeouts of connect(2). I don't think handling your case using failover is appropriate. Your case should be handled by removing the non-responding URI from the list.
I don't understand the difference. If a server is unavailable for whatever reason (offline, firewalled, switched off, nothing listening on the specified port), then connect() will timeout as you describe.
When connect(2) times out the code behaves as expected.
Can you explain further please? 'As expected' to you is obviously different to what I expect from the documentation and what you've said previously. You say that the failover mechanism works when connect() fails or times out, but that's not the behaviour I'm seeing.
Your description of your procedure is so vague and imprecise it's difficult for anybody to decipher what you're talking about.
Reading back thru the several posts in this thread, what I see you saying is that you have tested a few different configurations:
1) target host is up, target LDAP server is down this should fail immediately because the host OS will immediately send a TCP Connection Refused response
2) target host is initially down this will not fail until the first TCP connect request times out
3) target host is initially up and connected, but thru your iptables manipulation you sever the link this will not fail until the TCP connection times out, which it won't unless you're using TCP Keepalives, and by default those are only sent once every 2 hours.
On 23/08/2012 11:18, Howard Chu wrote:
Your description of your procedure is so vague and imprecise it's difficult for anybody to decipher what you're talking about.
Reading back thru the several posts in this thread, what I see you saying is that you have tested a few different configurations:
- target host is up, target LDAP server is down this should fail immediately because the host OS will immediately send a
TCP Connection Refused response
target host is initially down this will not fail until the first TCP connect request times out
target host is initially up and connected, but thru your iptables
manipulation you sever the link this will not fail until the TCP connection times out, which it won't unless you're using TCP Keepalives, and by default those are only sent once every 2 hours.
Let me make it less vague then.
What I've been trying to simulate are the various modes by which a uri target will become unavailable. What I'm trying to achieve is to have the meta backend point to four domain controllers and cope with one or more DCs being unavailable.
Having gone through this and let the system time out each time, I've found it does fail over under one of the conditions listed below, but it takes about 15 minutes to do so.
Scenarios:
1. slapd starts, first target is unreachable;
2. slapd starts, first target is reachable but has no service running;
3. slapd already running, first target up and connected then later becomes unreachable.
Simulations:
a. 'Unreachable' simulated by blocking outbound access with the following iptables rule:
iptables -A OUTPUT -d host1 -j DROP
b. 'Unreachable' simulated making the first target a host that is up but with no service running.
Results (all with 2.4.32):
Case 1a: slapd retries host1 continuously and times out after about 180s. No attempt is made to contact additional targets.
Case 2b: slapd retries host1 continuously and times out after about 180s. No attempt is made to contact additional targets.
Case 3a: slapd retries host1 continuously, doubling an internal timeout value each time, eventually timing out after 19 retries and about 15m. It does then fall through to host2 and subsequent connections don't attempt to contact host1.
Here's my config. I've also tried setting nretries explicitly to 3, but it makes no difference.
database meta suffix dc=local rootdn cn=administrator,dc=local rootpw secret
network-timeout 1
uri ldap://host1:3268/ou=dc1,dc=local ldap://host2:3268/ ldap://host3:3268/
suffixmassage "ou=dc1,dc=local" "dc=example,dc=com"
idassert-bind bindmethod=simple binddn="cn=proxyuser,dc=example,dc=com" credentials="password"
idassert-authzfrom "dn.exact:cn=administrator,dc=local"
These results suggest to me that network-timeout and nretries (which should default to 3) don't work as documented.
Having said that, it does seem to at least cope with scenario 3, albeit with a long timeout.
Ideally it'd work in all cases. Pierangelo says the failover works when connect() times out, but I'd have thought that would include scenarios 1 and 2 but not 3.
Liam Gretton wrote:
On 23/08/2012 11:18, Howard Chu wrote:
Your description of your procedure is so vague and imprecise it's difficult for anybody to decipher what you're talking about.
Reading back thru the several posts in this thread, what I see you saying is that you have tested a few different configurations:
target host is up, target LDAP server is down
this should fail immediately because the host OS will immediately send a
TCP Connection Refused response
target host is initially down
this will not fail until the first TCP connect request times out
target host is initially up and connected, but thru your
iptables manipulation you sever the link
this will not fail until the TCP connection times out, which it won't
unless you're using TCP Keepalives, and by default those are only sent once every 2 hours.
Let me make it less vague then.
What I've been trying to simulate are the various modes by which a uri target will become unavailable. What I'm trying to achieve is to have the meta backend point to four domain controllers and cope with one or more DCs being unavailable.
Having gone through this and let the system time out each time, I've found it does fail over under one of the conditions listed below, but it takes about 15 minutes to do so.
Scenarios:
slapd starts, first target is unreachable;
slapd starts, first target is reachable but has no service
running;
- slapd already running, first target up and connected then later
becomes unreachable.
Simulations:
a. 'Unreachable' simulated by blocking outbound access with the following iptables rule:
iptables -A OUTPUT -d host1 -j DROP
b. 'Unreachable' simulated making the first target a host that is up but with no service running.
Results (all with 2.4.32):
Case 1a: slapd retries host1 continuously and times out after about 180s. No attempt is made to contact additional targets.
Case 2b: slapd retries host1 continuously and times out after about 180s. No attempt is made to contact additional targets.
Case 3a: slapd retries host1 continuously, doubling an internal timeout value each time, eventually timing out after 19 retries and about 15m. It does then fall through to host2 and subsequent connections don't attempt to contact host1.
Here's my config. I've also tried setting nretries explicitly to 3, but it makes no difference.
database meta suffix dc=local rootdn cn=administrator,dc=local rootpw secret
network-timeout 1
uri ldap://host1:3268/ou=dc1,dc=local ldap://host2:3268/ ldap://host3:3268/
suffixmassage "ou=dc1,dc=local" "dc=example,dc=com"
idassert-bind bindmethod=simple binddn="cn=proxyuser,dc=example,dc=com" credentials="password"
idassert-authzfrom "dn.exact:cn=administrator,dc=local"
These results suggest to me that network-timeout and nretries (which should default to 3) don't work as documented.
I am really not astonished about your results. Run your tests again, but use "reject" as iptables target.
"drop" means, that you never ever get an answer.
Having said that, it does seem to at least cope with scenario 3, albeit with a long timeout.
Ideally it'd work in all cases. Pierangelo says the failover works when connect() times out, but I'd have thought that would include scenarios 1 and 2 but not 3.
On 24/08/2012 12:48, harry.jede@arcor.de wrote:
I am really not astonished about your results. Run your tests again, but use "reject" as iptables target.
"drop" means, that you never ever get an answer.
Ok, tried that.
For scenario 1, search against slapd times out after about 3s, doesn't attempt to contact host1.
For scenario 3 it makes no difference, after about 15 mins slapd times out against host1 and contacts host2 instead.
Hi Liam,
IMHO you'd be better off using a hardware/software failover device. There are several free linux based ones that will run on commodity or dedicated hardware.
Then you have complete control of the failover policy. Using a single app server to provide failover for some other app server(s) is like cracking walnuts with a ming vase. It will work until it breaks.
Software like pfSense works at a low level, does ip pooling, and itself can be made redundant.. And run as an appliance on vmware etc.,
Ditto setting up 2 new servers with centos/redhat you get LVS, but is a bit harder to configure unless you are willing to spend the extra time learning how..
The openldap code probably is not ideal the way you are using it, probably because other people in the past have not done failover like you are doing it..
Cheers Brett
On 24/08/2012, at 7:22 PM, Liam Gretton liam.gretton@leicester.ac.uk wrote:
On 24/08/2012 12:48, harry.jede@arcor.de wrote:
I am really not astonished about your results. Run your tests again, but use "reject" as iptables target.
"drop" means, that you never ever get an answer.
Ok, tried that.
For scenario 1, search against slapd times out after about 3s, doesn't attempt to contact host1.
For scenario 3 it makes no difference, after about 15 mins slapd times out against host1 and contacts host2 instead.
-- Liam Gretton liam.gretton@le.ac.uk HPC Architect http://www.le.ac.uk/its IT Services Tel: +44 (0)116 2522254 University of Leicester, University Road Leicestershire LE1 7RH, United Kingdom
Liam Gretton wrote:
On 23/08/2012 11:18, Howard Chu wrote:
Your description of your procedure is so vague and imprecise it's difficult for anybody to decipher what you're talking about.
Reading back thru the several posts in this thread, what I see you saying is that you have tested a few different configurations:
- target host is up, target LDAP server is down this should fail immediately because the host OS will immediately send a
TCP Connection Refused response
target host is initially down this will not fail until the first TCP connect request times out
target host is initially up and connected, but thru your iptables
manipulation you sever the link this will not fail until the TCP connection times out, which it won't unless you're using TCP Keepalives, and by default those are only sent once every 2 hours.
Let me make it less vague then.
What I've been trying to simulate are the various modes by which a uri target will become unavailable. What I'm trying to achieve is to have the meta backend point to four domain controllers and cope with one or more DCs being unavailable.
Having gone through this and let the system time out each time, I've found it does fail over under one of the conditions listed below, but it takes about 15 minutes to do so.
Scenarios:
slapd starts, first target is unreachable;
slapd starts, first target is reachable but has no service running;
slapd already running, first target up and connected then later
becomes unreachable.
Simulations:
a. 'Unreachable' simulated by blocking outbound access with the following iptables rule:
iptables -A OUTPUT -d host1 -j DROP
b. 'Unreachable' simulated making the first target a host that is up but with no service running.
Results (all with 2.4.32):
Case 1a: slapd retries host1 continuously and times out after about 180s. No attempt is made to contact additional targets.
Case 2b: slapd retries host1 continuously and times out after about 180s. No attempt is made to contact additional targets.
Case 3a: slapd retries host1 continuously, doubling an internal timeout value each time, eventually timing out after 19 retries and about 15m. It does then fall through to host2 and subsequent connections don't attempt to contact host1.
Here's my config. I've also tried setting nretries explicitly to 3, but it makes no difference.
database meta suffix dc=local rootdn cn=administrator,dc=local rootpw secret
network-timeout 1
uri ldap://host1:3268/ou=dc1,dc=local ldap://host2:3268/ ldap://host3:3268/
suffixmassage "ou=dc1,dc=local" "dc=example,dc=com"
idassert-bind bindmethod=simple binddn="cn=proxyuser,dc=example,dc=com" credentials="password"
idassert-authzfrom "dn.exact:cn=administrator,dc=local"
These results suggest to me that network-timeout and nretries (which should default to 3) don't work as documented.
Having said that, it does seem to at least cope with scenario 3, albeit with a long timeout.
Ideally it'd work in all cases. Pierangelo says the failover works when connect() times out, but I'd have thought that would include scenarios 1 and 2 but not 3.
Sounds like you should file an ITS.
Pierangelo: looking at libldap/request.c and libldap/.open.c, it appears that request.c:ldap_new_connection() expects open.c:ldap_int_open_connection() to return -2 on an asynch open, but ldap_int_open_connection() unconditionally returns 0. This is probably interfering with back-meta's urllist_proc.
On 24/08/2012 19:55, Howard Chu wrote:
Ideally it'd work in all cases. Pierangelo says the failover works when connect() times out, but I'd have thought that would include scenarios 1 and 2 but not 3.
Sounds like you should file an ITS.
ITS#7372 submitted.
openldap-technical@openldap.org