I've been trying to get slapd-meta to failover using multiple URIs but can't get it to work.
Initially I was using 2.4.26, but having seen the report in ITS#7050 I've now built 2.4.32 but the problem is still there as far as I can tell. This bug was quashed in 2.4.29 according to the change log.
In the example below, if host1 is not contactable at the point a search is performed, host2 will be contacted and the result returned correctly but ldapsearch then hangs indefinitely and the server's debug (level 1) output spews the following messages endlessly:
ldap_sasl_bind ldap_send_initial_request ldap_int_poll: fd: 10 tm: 0 502a4634 conn=1001 op=1 <<< meta_search_dobind_init[0]=4 502a4634 conn=1001 op=1 >>> meta_search_dobind_init[0]
Here's the relevant portion of slapd.conf:
database meta suffix dc=local rootdn cn=administrator,dc=local rootpw secret
network-timeout 3
uri ldap://host1:3268/ou=dc1,dc=local uri ldap://host2:3268/ou=dc1,dc=local uri ldap://host3:3268/ou=dc1,dc=local
suffixmassage "ou=dc1,dc=local" "dc=example,dc=com"
idassert-bind bindmethod=simple binddn="cn=proxyuser,dc=example,dc=com" credentials="password"
idassert-authzfrom "dn.exact:cn=administrator,dc=local"
Am I doing something wrong or has the bug described in ITS#7050 crept back in?
-----Ursprüngliche Nachricht----- An: openldap-technical@openldap.org; Von: Liam Gretton liam.gretton@leicester.ac.uk Gesendet: Di 14.08.2012 15:18 Betreff: slapd-meta doesn't continue with multiple uri's
I've been trying to get slapd-meta to failover using multiple URIs but can't get it to work.
Initially I was using 2.4.26, but having seen the report in ITS#7050 I've now built 2.4.32 but the problem is still there as far as I can tell. This bug was quashed in 2.4.29 according to the change log.
In the example below, if host1 is not contactable at the point a search is performed, host2 will be contacted and the result returned correctly but ldapsearch then hangs indefinitely and the server's debug (level 1) output spews the following messages endlessly:
ldap_sasl_bind ldap_send_initial_request ldap_int_poll: fd: 10 tm: 0 502a4634 conn=1001 op=1 <<< meta_search_dobind_init[0]=4 502a4634 conn=1001 op=1 >>> meta_search_dobind_init[0]
Here's the relevant portion of slapd.conf:
database meta suffix dc=local rootdn cn=administrator,dc=local rootpw secret
network-timeout 3
uri ldap://host1:3268/ou=dc1,dc=local uri ldap://host2:3268/ou=dc1,dc=local uri ldap://host3:3268/ou=dc1,dc=local
suffixmassage "ou=dc1,dc=local" "dc=example,dc=com"
idassert-bind bindmethod=simple binddn="cn=proxyuser,dc=example,dc=com" credentials="password"
idassert-authzfrom "dn.exact:cn=administrator,dc=local"
Am I doing something wrong or has the bug described in ITS#7050 crept back in?
-- Liam Gretton liam.gretton@le.ac.uk HPC Architect http://www.le.ac.uk/its IT Services Tel: +44 (0)116 2522254 University of Leicester, University Road Leicestershire LE1 7RH, United Kingdom
Did You ever try
uri "ldap://host1:3268/ou=dc1,dc=local" "ldap://host2:3268" "ldap://host3:3268" ?
I've been trying to get slapd-meta to failover using multiple URIs but can't get it to work.
Initially I was using 2.4.26, but having seen the report in ITS#7050 I've now built 2.4.32 but the problem is still there as far as I can tell. This bug was quashed in 2.4.29 according to the change log.
In the example below, if host1 is not contactable at the point a search is performed, host2 will be contacted and the result returned correctly but ldapsearch then hangs indefinitely and the server's debug (level 1) output spews the following messages endlessly:
ldap_sasl_bind ldap_send_initial_request ldap_int_poll: fd: 10 tm: 0 502a4634 conn=1001 op=1 <<< meta_search_dobind_init[0]=4 502a4634 conn=1001 op=1 >>> meta_search_dobind_init[0]
Here's the relevant portion of slapd.conf:
database meta suffix dc=local rootdn cn=administrator,dc=local rootpw secret
network-timeout 3
uri ldap://host1:3268/ou=dc1,dc=local uri ldap://host2:3268/ou=dc1,dc=local uri ldap://host3:3268/ou=dc1,dc=local
suffixmassage "ou=dc1,dc=local" "dc=example,dc=com"
idassert-bind bindmethod=simple binddn="cn=proxyuser,dc=example,dc=com" credentials="password"
idassert-authzfrom "dn.exact:cn=administrator,dc=local"
Am I doing something wrong
You are. The above is creating three targets, one pointing to host1, one pointing to host2 and one pointing to host3. The rest of the configuration is associated to the last target, the others are sort of dangling. A correct configuration for failover would be
uri ldap://host1:3268/ou=dc1,dc=local ldap://host2:3268/ ldap://host3:3268/ suffixmassage "ou=dc1,dc=local" "dc=example,dc=com" idassert-bind bindmethod=simple binddn="cn=proxyuser,dc=example,dc=com" credentials="password" idassert-authzfrom "dn.exact:cn=administrator,dc=local"
Note that URIs other than the first one cannot have the DN part (the same of the first URI is assumed).
p.
or has the bug described in ITS#7050 crept back in?
-- Liam Gretton liam.gretton@le.ac.uk HPC Architect http://www.le.ac.uk/its IT Services Tel: +44 (0)116 2522254 University of Leicester, University Road Leicestershire LE1 7RH, United Kingdom
On 14/08/2012 14:52, masarati@aero.polimi.it wrote:
You are. The above is creating three targets, one pointing to host1, one pointing to host2 and one pointing to host3. The rest of the configuration is associated to the last target, the others are sort of dangling. A correct configuration for failover would be
uri ldap://host1:3268/ou=dc1,dc=local ldap://host2:3268/ ldap://host3:3268/ suffixmassage "ou=dc1,dc=local" "dc=example,dc=com" idassert-bind bindmethod=simple binddn="cn=proxyuser,dc=example,dc=com" credentials="password" idassert-authzfrom "dn.exact:cn=administrator,dc=local"
Note that URIs other than the first one cannot have the DN part (the same of the first URI is assumed).
Understood. However in that case the server never attempts to contact host2 or host3 at all. Here's the output from the debug log:
502a5ae6 >>> slap_listener(ldapi://%2Fvar%2Frun%2Fslapd%2Fldapi-meta) 502a5ae6 connection_get(8): got connid=1000 502a5ae6 connection_read(8): checking for input on id=1000 ber_get_next ber_get_next: tag 0x30 len 43 contents: 502a5ae6 op tag 0x60, time 1344953062 ber_get_next 502a5ae6 conn=1000 op=0 do_bind ber_scanf fmt ({imt) ber: ber_scanf fmt (m}) ber: 502a5ae6 >>> dnPrettyNormal: <cn=administrator,dc=local> 502a5ae6 <<< dnPrettyNormal: <cn=administrator,dc=local>, <cn=administrator,dc=local> 502a5ae6 do_bind: version=3 dn="cn=administrator,dc=local" method=128 502a5ae6 conn=1000 op=0: rootdn="cn=administrator,dc=local" bind succeeded 502a5ae6 do_bind: v3 bind: "cn=administrator,dc=local" to "cn=administrator,dc=local" 502a5ae6 send_ldap_result: conn=1000 op=0 p=3 502a5ae6 send_ldap_response: msgid=1 tag=97 err=0 ber_flush2: 14 bytes to sd 8 502a5ae6 connection_get(8): got connid=1000 502a5ae6 connection_read(8): checking for input on id=1000 ber_get_next ber_get_next: tag 0x30 len 44 contents: 502a5ae6 op tag 0x63, time 1344953062 ber_get_next 502a5ae6 conn=1000 op=1 do_search ber_scanf fmt ({miiiib) ber: 502a5ae6 >>> dnPrettyNormal: <dc=local> 502a5ae6 <<< dnPrettyNormal: <dc=local>, <dc=local> ber_scanf fmt ({mm}) ber: ber_scanf fmt ({M}}) ber: ldap_create ldap_url_parse_ext(ldap://host3:3268) ldap_url_parse_ext(ldap://host2:3268) ldap_url_parse_ext(ldap://host1:3268) 502a5ae6 conn=1000 op=1: meta_back_getconn[0] 502a5ae6 conn=1000 op=1 meta_back_getconn: candidates=1 conn=ROOTDN inserted 502a5ae6 conn=1000 op=1 >>> meta_back_search_start[0] 502a5ae6 conn=1000 op=1 >>> meta_search_dobind_init[0] ldap_sasl_bind ldap_send_initial_request ldap_new_connection 1 1 0 ldap_int_open_connection ldap_connect_to_host: TCP host1:3268 ldap_new_socket: 10 ldap_prepare_socket: 10 ldap_connect_to_host: Trying 192.168.1.1:3268 ldap_pvt_connect: fd: 10 tm: 5 async: -1 ldap_ndelay_on: 10 ldap_int_poll: fd: -1 tm: 0 502a5ae6 conn=1000 op=1 <<< meta_search_dobind_init[0]=4 502a5ae6 conn=1000 op=1 <<< meta_back_search_start[0]=4 502a5ae6 conn=1000 op=1 meta_back_search: ncandidates=1 cnd="*" 502a5ae6 conn=1000 op=1 >>> meta_search_dobind_init[0] ldap_sasl_bind ldap_send_initial_request ldap_int_poll: fd: 10 tm: 0 502a5ae6 conn=1000 op=1 <<< meta_search_dobind_init[0]=4 502a5ae6 conn=1000 op=1 >>> meta_search_dobind_init[0]
ldap_sasl_bind ldap_send_initial_request ldap_int_poll: fd: 10 tm: 0 502a5ae6 conn=1000 op=1 <<< meta_search_dobind_init[0]=4 502a5ae6 conn=1000 op=1 >>> meta_search_dobind_init[0]
ldap_sasl_bind ldap_send_initial_request ldap_int_poll: fd: 10 tm: 0 502a5ae6 conn=1000 op=1 <<< meta_search_dobind_init[0]=4 502a5ae6 conn=1000 op=1 >>> meta_search_dobind_init[0]
...etc
On 14/08/2012 14:52, masarati@aero.polimi.it wrote:
You are. The above is creating three targets, one pointing to host1, one pointing to host2 and one pointing to host3. The rest of the configuration is associated to the last target, the others are sort of dangling. A correct configuration for failover would be
uri ldap://host1:3268/ou=dc1,dc=local ldap://host2:3268/ ldap://host3:3268/ suffixmassage "ou=dc1,dc=local" "dc=example,dc=com" idassert-bind bindmethod=simple binddn="cn=proxyuser,dc=example,dc=com" credentials="password" idassert-authzfrom "dn.exact:cn=administrator,dc=local"
Note that URIs other than the first one cannot have the DN part (the same of the first URI is assumed).
Understood. However in that case the server never attempts to contact host2 or host3 at all. Here's the output from the debug log:
Correct. When host1 is down, host2 is contacted instead, and so forth.
p.
On 14/08/2012 15:28, masarati@aero.polimi.it wrote:
On 14/08/2012 14:52, masarati@aero.polimi.it wrote:
You are. The above is creating three targets, one pointing to host1, one pointing to host2 and one pointing to host3. The rest of the configuration is associated to the last target, the others are sort of dangling. A correct configuration for failover would be
uri ldap://host1:3268/ou=dc1,dc=local ldap://host2:3268/ ldap://host3:3268/ suffixmassage "ou=dc1,dc=local" "dc=example,dc=com" idassert-bind bindmethod=simple binddn="cn=proxyuser,dc=example,dc=com" credentials="password" idassert-authzfrom "dn.exact:cn=administrator,dc=local"
Note that URIs other than the first one cannot have the DN part (the same of the first URI is assumed).
Understood. However in that case the server never attempts to contact host2 or host3 at all. Here's the output from the debug log:
Correct. When host1 is down, host2 is contacted instead, and so forth.
If I wasn't clear, I changed the config as you suggested. The debug output I posted was from that configuration. The server never attempts to contact anything other than host1.
On 14/08/2012 15:28, masarati@aero.polimi.it wrote:
On 14/08/2012 14:52, masarati@aero.polimi.it wrote:
You are. The above is creating three targets, one pointing to host1, one pointing to host2 and one pointing to host3. The rest of the configuration is associated to the last target, the others are sort of dangling. A correct configuration for failover would be
uri ldap://host1:3268/ou=dc1,dc=local ldap://host2:3268/ ldap://host3:3268/ suffixmassage "ou=dc1,dc=local" "dc=example,dc=com" idassert-bind bindmethod=simple binddn="cn=proxyuser,dc=example,dc=com" credentials="password" idassert-authzfrom "dn.exact:cn=administrator,dc=local"
Note that URIs other than the first one cannot have the DN part (the same of the first URI is assumed).
Understood. However in that case the server never attempts to contact host2 or host3 at all. Here's the output from the debug log:
Correct. When host1 is down, host2 is contacted instead, and so forth.
If I wasn't clear, I changed the config as you suggested. The debug output I posted was from that configuration. The server never attempts to contact anything other than host1.
Did you try stopping host1 in between client operations? I did and it works as intended.
p.
On 14/08/2012 16:06, masarati@aero.polimi.it wrote:
If I wasn't clear, I changed the config as you suggested. The debug output I posted was from that configuration. The server never attempts to contact anything other than host1.
Did you try stopping host1 in between client operations? I did and it works as intended.
No, I've been initially testing with the case where host1 is down when the LDAP service starts.
If I remove host1 after the LDAP server has started, the debug output is at least different. It's attempting to contact host1, failing, doubling the timeout and trying again continuously, never attempting to try host2 or host3.
** ld 0xa2e4e0 Connections: * host: host1 port: 3268 (default) refcnt: 2 status: Connected last used: Tue Aug 14 16:11:36 2012
** ld 0xa2e4e0 Outstanding Requests: * msgid 7, origid 7, status InProgress outstanding referrals 0, parent count 0 ld 0xa2e4e0 request count 1 (abandoned 0) ** ld 0xa2e4e0 Response Queue: Empty ld 0xa2e4e0 response count 0 ldap_chkResponseList ld 0xa2e4e0 msgid 7 all 2 ldap_chkResponseList returns ld 0xa2e4e0 NULL ldap_int_select ldap_result ld 0xa2e4e0 msgid 7 wait4msg ld 0xa2e4e0 msgid 7 (timeout 100000 usec) wait4msg continue ld 0xa2e4e0 msgid 7 all 2 ** ld 0xa2e4e0 Connections: * host: host1 port: 3268 (default) refcnt: 2 status: Connected last used: Tue Aug 14 16:11:36 2012
** ld 0xa2e4e0 Outstanding Requests: * msgid 7, origid 7, status InProgress outstanding referrals 0, parent count 0 ld 0xa2e4e0 request count 1 (abandoned 0) ** ld 0xa2e4e0 Response Queue: Empty ld 0xa2e4e0 response count 0 ldap_chkResponseList ld 0xa2e4e0 msgid 7 all 2 ldap_chkResponseList returns ld 0xa2e4e0 NULL ldap_int_select ldap_result ld 0xa2e4e0 msgid 7 wait4msg ld 0xa2e4e0 msgid 7 (timeout 200000 usec) wait4msg continue ld 0xa2e4e0 msgid 7 all 2 ** ld 0xa2e4e0 Connections: * host: host1 port: 3268 (default) refcnt: 2 status: Connected last used: Tue Aug 14 16:11:36 2012
** ld 0xa2e4e0 Outstanding Requests: * msgid 7, origid 7, status InProgress outstanding referrals 0, parent count 0 ld 0xa2e4e0 request count 1 (abandoned 0) ** ld 0xa2e4e0 Response Queue: Empty ld 0xa2e4e0 response count 0 ldap_chkResponseList ld 0xa2e4e0 msgid 7 all 2 ldap_chkResponseList returns ld 0xa2e4e0 NULL ldap_int_select ldap_result ld 0xa2e4e0 msgid 7 wait4msg ld 0xa2e4e0 msgid 7 (timeout 400000 usec) wait4msg continue ld 0xa2e4e0 msgid 7 all 2 ** ld 0xa2e4e0 Connections: * host: host1 port: 3268 (default) refcnt: 2 status: Connected last used: Tue Aug 14 16:11:36 2012
...etc.
On 14/08/2012 16:06, masarati@aero.polimi.it wrote:
If I wasn't clear, I changed the config as you suggested. The debug output I posted was from that configuration. The server never attempts to contact anything other than host1.
Did you try stopping host1 in between client operations? I did and it works as intended.
No, I've been initially testing with the case where host1 is down when the LDAP service starts.
If I remove host1 after the LDAP server has started, the debug output is at least different. It's attempting to contact host1, failing, doubling the timeout and trying again continuously, never attempting to try host2 or host3.
The timeout you see is an internal timeout used for each poll on a target's connection. It keeps doubling when the connection is valid but nothing comes. Did you actually kill host1, or just stopped it? In the latter case, the connection is not dead, it's just returning nothing. You need to kill the process (or let it timeout using the "timeout" directive).
p.
On 14/08/2012 17:18, masarati@aero.polimi.it wrote:
If I remove host1 after the LDAP server has started, the debug output is at least different. It's attempting to contact host1, failing, doubling the timeout and trying again continuously, never attempting to try host2 or host3.
The timeout you see is an internal timeout used for each poll on a target's connection. It keeps doubling when the connection is valid but nothing comes. Did you actually kill host1, or just stopped it?
In the first case (host1 down when LDAP starts), I was testing by pointing at a host which has no LDAP service running on it at all, although the host itself was up.
In the second case (host1 down after LDAP starts), I was using a proper target (an AD domain controller) and setting an iptables rule to prevent outbound traffic to it:
iptables -A OUTPUT -d host1 -j DROP
In the latter case, the connection is not dead, it's just returning nothing. You need to kill the process (or let it timeout using the "timeout" directive).
Which timeout directive? I've already set network-timeout in the config for slapd-meta, and setting bind-timeout doesn't help either. I have no control over the configuration of the targets.
On 14/08/2012 17:18, masarati@aero.polimi.it wrote:
If I remove host1 after the LDAP server has started, the debug output is at least different. It's attempting to contact host1, failing, doubling the timeout and trying again continuously, never attempting to try host2 or host3.
The timeout you see is an internal timeout used for each poll on a target's connection. It keeps doubling when the connection is valid but nothing comes. Did you actually kill host1, or just stopped it?
In the first case (host1 down when LDAP starts), I was testing by pointing at a host which has no LDAP service running on it at all, although the host itself was up.
In the second case (host1 down after LDAP starts), I was using a proper target (an AD domain controller) and setting an iptables rule to prevent outbound traffic to it:
iptables -A OUTPUT -d host1 -j DROP
In the latter case, the connection is not dead, it's just returning nothing. You need to kill the process (or let it timeout using the "timeout" directive).
Which timeout directive? I've already set network-timeout in the config for slapd-meta, and setting bind-timeout doesn't help either. I have no control over the configuration of the targets.
bind-timeout and network-timeout have specific, connection-level meaning. Just "timeout <seconds>" (you can make it search-specific if you don't want it to affect other operations, using "timeout search=<seconds>".
p.
On 14/08/2012 21:57, masarati@aero.polimi.it wrote:
bind-timeout and network-timeout have specific, connection-level meaning. Just "timeout <seconds>" (you can make it search-specific if you don't want it to affect other operations, using "timeout search=<seconds>".
Setting timeout doesn't solve the problem, but it changes the behaviour. Now the ldapsearch times out after the value specified and reports:
result: 11 Administrative limit exceeded text: Operation timed out
...but the LDAP server still doesn't attempt to contact the failover hosts. I've also verified this with tcpdump.
To recap, here's my current config. I can't help but think I'm doing something obviously wrong here if it's working for others.
database meta suffix dc=local rootdn cn=administrator,dc=local rootpw secret
network-timeout 1 timeout 1
uri ldap://host1:3268/ou=dc1,dc=local ldap://host2:3268/ ldap://host3:3268/
suffixmassage "ou=dc1,dc=local" "dc=example,dc=com"
idassert-bind bindmethod=simple binddn="cn=proxyuser,dc=example,dc=com" credentials="password"
idassert-authzfrom "dn.exact:cn=administrator,dc=local"
Can anyone explain the interaction between 'network-timeout' and 'timeout'? I'm tearing my hair out with this problem and the timeout options are the only straws I have to clutch at.
On 15/08/2012 04:30, Liam Gretton wrote:
On 14/08/2012 21:57, masarati@aero.polimi.it wrote:
bind-timeout and network-timeout have specific, connection-level meaning. Just "timeout <seconds>" (you can make it search-specific if you don't want it to affect other operations, using "timeout search=<seconds>".
Setting timeout doesn't solve the problem, but it changes the behaviour. Now the ldapsearch times out after the value specified and reports:
result: 11 Administrative limit exceeded text: Operation timed out
...but the LDAP server still doesn't attempt to contact the failover hosts. I've also verified this with tcpdump.
To recap, here's my current config. I can't help but think I'm doing something obviously wrong here if it's working for others.
database meta suffix dc=local rootdn cn=administrator,dc=local rootpw secret
network-timeout 1 timeout 1
uri ldap://host1:3268/ou=dc1,dc=local ldap://host2:3268/ ldap://host3:3268/
suffixmassage "ou=dc1,dc=local" "dc=example,dc=com"
idassert-bind bindmethod=simple binddn="cn=proxyuser,dc=example,dc=com" credentials="password"
idassert-authzfrom "dn.exact:cn=administrator,dc=local"
I'm trying to get my head round the source code now to see if this is a bug.
One thing that looks odd to me in the debug output:
ldap_url_parse_ext(ldap://host1:3268) ldap_url_parse_ext(ldap://host2:3268) ldap_url_parse_ext(ldap://host3:3268) 502e5f7e conn=1000 op=1: meta_back_getconn[0] 502e5f7e conn=1000 op=1 meta_back_getconn: candidates=1 conn=ROOTDN inserted
Shouldn't this be 'candidates=3' in the last line above?
If anyone familiar with the source could let me know I'd be grateful.
openldap-technical@openldap.org