On Wed, 18 Jul 2007, Aaron Richton wrote:
As of 2.3.28, libldap's connections use TCP keepalives. You should be able to configure your networking stack to get the desired behavior.
But the connection has yet to be made, so keepalives don't enter into it.
We're a homogenous FreeBSD shop, so I'd like to eliminate either FreeBSD or OpenLDAP as a possibility before filing this as a bug with one or the other. Can someone please do:
On client.example.net, set up ldap.conf with
URI ldap://server1.example.net ldap://server2.example.net
Server1, although resolving, does not run an LDAP server (and may not phyically exist). If it's on the same subnet as the client, then so much the better as that eliminates any router issues.
What I am seeing is a timeout of a minute before switching to Server2.
LDAP debugging:
ldap_create ldap_url_parse_ext(ldap://server2.example.net) ldap_url_parse_ext(ldap://server1.example.net) ldap_search put_filter: "(uid=daveh)" put_filter: simple put_simple_filter: "uid=daveh" ldap_send_initial_request ldap_new_connection ldap_int_open_connection ldap_connect_to_host: TCP server1.example.net:389 ldap_new_socket: 3 ldap_prepare_socket: 3 ldap_connect_to_host: Trying 192.168.1.9:389 ldap_connect_timeout: fd: 3 tm: -1 async: 0 ldap_ndelay_on: 3
Delay occurs here...
ldap_is_sock_ready: 3 ldap_is_socket_ready: error on socket 3: errno: 60 (Operation timed out) ldap_close_socket: 3 ldap_int_open_connection ldap_connect_to_host: TCP server2.example.net:389 ldap_new_socket: 3 ldap_prepare_socket: 3 ldap_connect_to_host: Trying 192.XX.XX.XX:389 ldap_connect_timeout: fd: 3 tm: -1 async: 0 ldap_ndelay_on: 3 ldap_is_sock_ready: 3 ldap_ndelay_off: 3 ldap_open_defconn: successful
Etc.
Kernel trace around then:
1184800925.257583 CALL socket(0x2,0x1,0) 1184800925.257602 RET socket 3 1184800925.257624 CALL setsockopt(0x3,0x6,0x1,0xbfbfd8dc,0x4) 1184800925.257637 RET setsockopt 0 1184800925.257677 CALL fcntl(0x3,0x3,0x2804e58d) 1184800925.257689 RET fcntl 2 1184800925.257701 CALL fcntl(0x3,0x4,0x6) 1184800925.257712 RET fcntl 0 1184800925.257731 CALL connect(0x3,0x804f1a0,0x10) 1184800925.257793 RET connect -1 errno 36 Operation now in progress 1184800925.257826 CALL select(0x400,0,0xbfbfd850,0,0)
Delay here.
1184801000.246370 RET select 1 1184801000.246438 CALL getpeername(0x3,0xbfbfd790,0xbfbfd78c) 1184801000.246450 RET getpeername -1 errno 57 Socket is not connected 1184801000.246505 CALL read(0x3,0xbfbfd78b,0x1) 1184801000.246519 RET read -1 errno 60 Operation timed out 1184801000.246543 CALL shutdown(0x3,0x2) 1184801000.246556 RET shutdown -1 errno 22 Invalid argument 1184801000.246576 CALL close(0x3) 1184801000.246593 RET close 0
Revealingly, should the client attempt to contact itself first (where there is no server) then the switch-over happens right away, but the network guru swears up and down that there are no packet filters in the way.