Re:Slapd-meta stop at the first unreachable candidate - openldap-technical

9 Sep 2011

      I performed many new tests and provide below several new comments. I recall that of them concern the fact to perform an ldapsearch on the meta suffix while one URI is unrechable.
1/ When network-timeout is not set for the unreachable URI, then no entry is returned. Unless the URI becomes unreachable after the meta startup AND for the first request only.
2/ When network-timeout is set, then the meta always returns entries, but only from the URI which are above the unreachable URI is slapd.conf.
3/ When conn-ttl is set, the walk-around which is to perform a search below the root node in order to open the channel to the different URI, is not working. Channels are always lost after the ttl expire.
4/ I saw slightly different behaviours when the unrechable URI is the first one in slapd.conf. Most of my earlier tests were performed with an unreachable URI which is the second one in slapd.conf.
4/ Such behaviour concern the case in which the URI contains valid IP (I mean pingable) but the port is unreachable (remote ldap server is stopped or port in wrong). I mean : each time the connect error is returned very quickly from network layers. But there are other cases in which connect error is returned more slowly (for instance IP address is not "pingable" or IP address is a LB virtual IP address and the remote server is unreachable). In these later cases, the meta behaviour is greatly different. I mean the meta behaviour is quite normal and entries from all reachable URI are returned ... once the different timer have elapsed. However, this later case led me to several new problems described below.
5/ The timer which is taken into account for the connection is not the "network-timeout" parameter but the "timeout" parameter. According to the slapd-meta man page, this timeout is supposed to be intended for LDAP operations and not for the network connection step.
6/ The value of the "timeout" parameter is the only one which is taken into account, but ... only if the "network-timeout" parameter is set. In other cases, a default (and longer) timeout seems to be applied. The value of network-timeout remains meaningless, it has only to be set with any value.
7/ The "timeout" parameter which is used is not the parameter corresponding to the unreachable URI. After many tests, I discovered it is the maximum value among all "timeout" in the "database meta" section, whatever the URI is !!!
Conclusions: all these parameters provide a really powerfull way to cutomize the configuration but few of them are working as excepted. Could anyone explain me whether I can provide all those details as a bug report ? Or should I wait that anyone could reproduce ?
Thanks for all.
Michel
...
Message du 05/09/11 16:31
De : "Michel Gruau" 
A : "openldap-technical openldap org" 
Copie à : 
Objet : Re:Slapd-meta stop at the first unreachable candidate
Below is a sample configuration allowing to reproduce the problem :
Three openldap data instances configured as follows: 
include         /opt/openldap/etc/openldap/schema/core.schema
include         /opt/openldap/etc/openldap/schema/cosine.schema
include         /opt/openldap/etc/openldap/schema/inetorgperson.schema
include         /opt/openldap/etc/openldap/schema/nis.schema
include         /opt/openldap/etc/openldap/schema/dyngroup.schema
include         /opt/openldap/etc/openldap/schema/misc.schema
pidfile         /opt/openldap/var/run/server1.pid
argsfile        /opt/openldap/var/run/server1.args
loglevel        stats
database        bdb
suffix          ou=orgunit,o=gouv,c=fr
directory       /opt/openldap/var/server1
Note: server1 is changed by server2 and server3 for other instances.
Each instance contains the following data: (only 4 entries):
dn: ou=orgunit,o=gouv,c=fr
objectClass: top
objectClass: organizationalUnit
ou: orgunit
dn: ou=dept1,ou=orgunit,o=gouv,c=fr
ou: dept1
objectClass: top
objectClass: organizationalUnit
dn: uid=user11,ou=dept1,ou=orgunit,o=gouv,c=fr
objectClass: top
objectClass: person
objectClass: organizationalPerson
objectClass: inetOrgPerson
mail: user11@server1.com
cn: User 11
uid: user11
givenName: User
sn: 11
dn: uid=user12,ou=dept1,ou=orgunit,o=gouv,c=fr
objectClass: top
objectClass: person
objectClass: organizationalPerson
objectClass: inetOrgPerson
mail: user12@server1.com
cn: User 12
uid: user12
givenName: User
sn: 12
Note: user1x and dept1 are substituted in instances 2 and 3 by user2x, dept2, user3x,dept3.
Data instances are launched using this command:
/opt/openldap/libexec/slapd -n server1 -f /opt/openldap/etc/openldap/server1.conf -h ldap://0.0.0.0:1001/
/opt/openldap/libexec/slapd -n server2 -f /opt/openldap/etc/openldap/server2.conf -h ldap://0.0.0.0:1002/
/opt/openldap/libexec/slapd -n server3 -f /opt/openldap/etc/openldap/server3.conf -h ldap://0.0.0.0:1003/
Meta instance is configured as follows: 
include         /opt/openldap/etc/openldap/schema/core.schema
include         /opt/openldap/etc/openldap/schema/cosine.schema
include         /opt/openldap/etc/openldap/schema/inetorgperson.schema
include         /opt/openldap/etc/openldap/schema/nis.schema
include         /opt/openldap/etc/openldap/schema/dyngroup.schema
include         /opt/openldap/etc/openldap/schema/anais.schema
include         /opt/openldap/etc/openldap/schema/misc.schema
pidfile         /opt/openldap/var/run/meta.pid
argsfile        /opt/openldap/var/run/meta.args
database        meta
suffix          ou=orgunit,o=gouv,c=fr
uri ldap://localhost:1001/ou=dept1,ou=orgunit,o=gouv,c=fr
#network-timeout 5
#timeout 3
uri ldap://localhost:1002/ou=dept2,ou=orgunit,o=gouv,c=fr
#network-timeout 5
#timeout 4
uri ldap://localhost:1003/ou=dept3,ou=orgunit,o=gouv,c=fr
#network-timeout 5
#timeout 4
and it is launched as follows:
/opt/openldap/libexec/slapd -n meta -f /opt/openldap/etc/openldap/meta.conf -h ldap://0.0.0.0:1000/ -d 256
# test with the 3 servers up
/opt/openldap/bin/ldapsearch -LLL -x -H ldap://pp-ae2-proxy1.alize:1000 -b ou=orgunit,o=gouv,c=fr  objectclass=person dn  |grep dn:
dn: uid=user11,ou=dept1,ou=orgunit,o=gouv,c=fr
dn: uid=user31,ou=dept3,ou=orgunit,o=gouv,c=fr
dn: uid=user12,ou=dept1,ou=orgunit,o=gouv,c=fr
dn: uid=user32,ou=dept3,ou=orgunit,o=gouv,c=fr
dn: uid=user21,ou=dept2,ou=orgunit,o=gouv,c=fr
dn: uid=user22,ou=dept2,ou=orgunit,o=gouv,c=fr
=> entries from the three servers are returned
# stop server 2 (kill -INT ...) and perfom a new search:
/opt/openldap/bin/ldapsearch -LLL -x -H ldap://pp-ae2-proxy1.alize:1000 -b ou=orgunit,o=gouv,c=fr  objectclass=person dn  |grep dn:
dn: uid=user11,ou=dept1,ou=orgunit,o=gouv,c=fr
dn: uid=user12,ou=dept1,ou=orgunit,o=gouv,c=fr
dn: uid=user31,ou=dept3,ou=orgunit,o=gouv,c=fr
dn: uid=user32,ou=dept3,ou=orgunit,o=gouv,c=fr
=> looks good : entries from server1 and server 3 are returned
Below are the meta instance logs:
conn=1001 fd=9 ACCEPT from IP=172.30.8.13:55048 (IP=0.0.0.0:1000)
conn=1001 op=0 BIND dn="" method=128
conn=1001 op=0 RESULT tag=97 err=0 text=
conn=1001 op=1 SRCH base="ou=orgunit,o=gouv,c=fr" scope=2 deref=0 filter="(objectClass=person)"
conn=1001 op=1 SRCH attr=dn
conn=1001 op=1 meta_back_retry[1]: retrying URI="ldap://localhost:1002" DN="".
conn=1001 op=1 meta_back_retry[1]: meta_back_single_dobind=52
conn=1001 op=1 SEARCH RESULT tag=101 err=0 nentries=4 text=
conn=1001 op=2 UNBIND
conn=1001 fd=9 closed
=> looks good as nentries=4
# perform numerous new search without changing anything:
[root@pp-ae2-proxy2 log]# /opt/openldap/bin/ldapsearch -LLL -x -H ldap://pp-ae2-proxy1.alize:1000 -b ou=orgunit,o=gouv,c=fr  objectclass=person dn  |grep dn:
=> nothing returned
Below are the corresponding logs:
conn=1002 fd=9 ACCEPT from IP=172.30.8.13:55049 (IP=0.0.0.0:1000)
conn=1002 op=0 BIND dn="" method=128
conn=1002 op=0 RESULT tag=97 err=0 text=
conn=1002 op=1 SRCH base="ou=orgunit,o=gouv,c=fr" scope=2 deref=0 filter="(objectClass=person)"
conn=1002 op=1 SRCH attr=dn
conn=1002 op=1 meta_search_dobind_init[1]: retrying URI="ldap://localhost:1002" DN="".
conn=1002 op=1 SEARCH RESULT tag=101 err=0 nentries=0 text=
conn=1002 op=2 UNBIND
conn=1002 fd=9 closed
=> looks bad as nentries=0
=> Only the first search after server2 stop is successfull.
# new search but using server1 ou:
/opt/openldap/bin/ldapsearch -LLL -x -H ldap://pp-ae2-proxy1.alize:1000 -b ou=dept1,ou=orgunit,o=gouv,c=fr  objectclass=person dn  |grep dn:
dn: uid=user11,ou=dept1,ou=orgunit,o=gouv,c=fr
dn: uid=user12,ou=dept1,ou=orgunit,o=gouv,c=fr
=> looks good
# same search as earlier i.e. using root node:
/opt/openldap/bin/ldapsearch -LLL -x -H ldap://pp-ae2-proxy1.alize:1000 -b ou=orgunit,o=gouv,c=fr  objectclass=person dn  |grep dn:
dn: uid=user11,ou=dept1,ou=orgunit,o=gouv,c=fr
dn: uid=user12,ou=dept1,ou=orgunit,o=gouv,c=fr
=> Looks good also. It looks like all is OK but only once a channel is opened to server1 using another manner
# new search but using server3 base object:
/opt/openldap/bin/ldapsearch -LLL -x -H ldap://pp-ae2-proxy1.alize:1000 -b ou=dept3,ou=orgunit,o=gouv,c=fr  objectclass=person dn  |grep dn:
dn: uid=user31,ou=dept3,ou=orgunit,o=gouv,c=fr
dn: uid=user32,ou=dept3,ou=orgunit,o=gouv,c=fr
=> looks good
# new search but using slapd-meta base object:
/opt/openldap/bin/ldapsearch -LLL -x -H ldap://pp-ae2-proxy1.alize:1000 -b ou=orgunit,o=gouv,c=fr  objectclass=person dn  |grep dn:
dn: uid=user11,ou=dept1,ou=orgunit,o=gouv,c=fr
dn: uid=user12,ou=dept1,ou=orgunit,o=gouv,c=fr
dn: uid=user31,ou=dept3,ou=orgunit,o=gouv,c=fr
dn: uid=user32,ou=dept3,ou=orgunit,o=gouv,c=fr
=> entries from server1 and server3 are returned
=> this confirms lookups in server1 and server3 are not performed until a channel is opened to both of them using their repective base object
=> another strange behavior : if search using server3 ou is performed before serach using server ou, then next search attempt using root node allows to retrieve entries from both server1 and server3 ...
# new search after server2 restart: 
/opt/openldap/bin/ldapsearch -LLL -x -H ldap://pp-ae2-proxy1.alize:1000 -b ou=orgunit,o=gouv,c=fr  objectclass=person dn  |grep dn:
dn: uid=user11,ou=dept1,ou=orgunit,o=gouv,c=fr
dn: uid=user12,ou=dept1,ou=orgunit,o=gouv,c=fr
dn: uid=user21,ou=dept2,ou=orgunit,o=gouv,c=fr
dn: uid=user31,ou=dept3,ou=orgunit,o=gouv,c=fr
dn: uid=user22,ou=dept2,ou=orgunit,o=gouv,c=fr
dn: uid=user32,ou=dept3,ou=orgunit,o=gouv,c=fr
=> good, all entries are returned
# new search after meta instance restart while server2 is already stopped 
opt/openldap/bin/ldapsearch -LLL -x -H ldap://pp-ae2-proxy1.alize:1000 -b ou=orgunit,o=gouv,c=fr  objectclass=person dn  |grep dn:
=> unlike the previous test case (server2 stopped while meta instance is already running) we do not see the single successfull search.
Then, behaviour is the same i.e. search on root node works again, but only once a search has been performed using ou=dept1 and ou=dept3.
In addition, behaviour is slightly different adding the "conn-ttl" parameter set to 3 (3 seconds). I could expose it in a new post.
Thanks for anyone who could help to identify whether it is a misconfiguration or a bug.
Michel Gruau
...
Message du 19/08/11 13:13
De : "Michel Gruau" 
A : "openldap-technical openldap org" 
Copie à : 
Objet : Slapd-meta stop at the first unreachable candidate
Hello,
It have a slapd-meta configuration as follows:
database meta
suffix dc=com
uri ldap://server1:389/dc=suffix1,dc=com
uri ldap://server2:389/dc=suffix2,dc=com
uri ldap://server3:389/dc=suffix3,dc=com
I performed numerous tests using "base=com" and changing the order of the above list of uri (in slapd.cnof) and I see that as soon as a candidate directory is unreachable, all other directories located below the directory in failure are not requested by the proxy. For instance, in example below:

if server2 is down, then server 3 is not requeted
if server1 is down, then none of the directories is requested.

I have the felling this is a bug ... could you confirm ?
FYI, I also tried the "'onerrr continue" config, but did not change annything
Thanks in advance.
Michel
...
Une messagerie gratuite, garantie à vie et des services en plus, ça vous tente ?
Je crée ma boîte mail www.laposte.net
Une messagerie gratuite, garantie à vie et des services en plus, ça vous tente ?
Je crée ma boîte mail www.laposte.net