Real idletimeout more than configured idletimeout - openldap-technical

6 Jul 2008


      Dear openldap gurus,
I am hitting some strange behavior with the idle sessions timeout
feature. In my configuration this timeout is set to 60 seconds on 4
slaves that are behind a load balancer. This load balancer times-out
idle sessions after 90 seconds, which should be fine. Openldap version
is the stable one from Debian Etch r3.
I however encounter random connection issues that have been traced to
the load balancer timeouting and idle session *before* the ldap slave.
I have straced the slapd process and I found out the applyed idletimeout
was way above the  configured one, please check the two following strace
output :
Output 1
futex(0x603428, FUTEX_WAKE, 1) = 1 
select(16, [4 6 7], NULL, NULL, {300, 0}) = 1 (in [6], left {238,
132000}) 
accept(6, {sa_family=AF_INET, sin_port=htons(34103),
sin_addr=inet_addr("192.168.1.1")}, [4647729933731233808]) = 12 
setsockopt(12, SOL_SOCKET, SO_KEEPALIVE, [1], 4) = 0 
setsockopt(12, SOL_TCP, TCP_NODELAY, [1], 4) = 0 
open("/etc/hosts.allow", O_RDONLY) = 15 
fstat(15, {st_mode=S_IFREG|0644, st_size=677, ...}) = 0 
mmap(NULL, 4096, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0)
= 0x2aaaaccb3000 
read(15, "# /etc/hosts.allow: list of host"..., 4096) = 677 
read(15, "", 4096) = 0 
close(15) = 0 
munmap(0x2aaaaccb3000, 4096) = 0 
open("/etc/hosts.deny", O_RDONLY) = 15 
fstat(15, {st_mode=S_IFREG|0644, st_size=901, ...}) = 0 
mmap(NULL, 4096, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0)
= 0x2aaaaccb3000 
read(15, "# /etc/hosts.deny: list of hosts"..., 4096) = 901 
read(15, "", 4096) = 0 
close(15) = 0 
munmap(0x2aaaaccb3000, 4096) = 0 
fcntl(12, F_GETFL) = 0x2 (flags O_RDWR) 
fcntl(12, F_SETFL, O_RDWR|O_NONBLOCK) = 0 
stat("/etc/localtime", {st_mode=S_IFREG|0644, st_size=2945, ...}) = 0 
stat("/etc/localtime", {st_mode=S_IFREG|0644, st_size=2945, ...}) = 0 
stat("/etc/localtime", {st_mode=S_IFREG|0644, st_size=2945, ...}) = 0 
sendto(3, "<167>Jul 1 09:14:14 slapd[2765]"..., 102, MSG_NOSIGNAL, NULL,
0) = 102 
select(16, [4 6 7 12], NULL, NULL, {15, 0}) = 1 (in [12], left {15, 0}) 
read(12, "0b\2\1\1`]\2", 8) = 8
[ some uninteresting ldap stuff ]
futex(0x603428, FUTEX_WAKE, 1) = 1 
read(12, 0x6f30ff, 8) = -1 EAGAIN (Resource temporarily unavailable) 
futex(0x2b0db3b35dc8, FUTEX_WAKE, 1) = 1 
select(16, [4 6 7 12], NULL, NULL, {15, 0}) = 0 (Timeout) 
select(16, [4 6 7 12], NULL, NULL, {15, 0}) = 0 (Timeout) 
select(16, [4 6 7 12], NULL, NULL, {15, 0}) = 0 (Timeout) 
select(16, [4 6 7 12], NULL, NULL, {15, 0}) = 0 (Timeout) 
select(16, [4 6 7 12], NULL, NULL, {15, 0}) = 0 (Timeout) 
write(5, "0", 1) = 1 
shutdown(12, 2 /* send and receive */) = 0
close(12) = 0
Here, we can see 5 select system calls for a real idletimeout is 75
seconds instead of 60.
Output 2
futex(0x603428, FUTEX_WAKE, 1) = 1 
select(16, [4 6 7], NULL, NULL, {300, 0}) = 1 (in [6], left {230,
828000}) 
accept(6, {sa_family=AF_INET, sin_port=htons(51692),
sin_addr=inet_addr("192.168.1.1")}, [4647729933731233808]) = 12 
setsockopt(12, SOL_SOCKET, SO_KEEPALIVE, [1], 4) = 0 
setsockopt(12, SOL_TCP, TCP_NODELAY, [1], 4) = 0 
open("/etc/hosts.allow", O_RDONLY) = 15 
fstat(15, {st_mode=S_IFREG|0644, st_size=677, ...}) = 0 
mmap(NULL, 4096, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0)
= 0x2aaaaccb3000 
read(15, "# /etc/hosts.allow: list of host"..., 4096) = 677 
read(15, "", 4096) = 0 
close(15) = 0 
munmap(0x2aaaaccb3000, 4096) = 0 
open("/etc/hosts.deny", O_RDONLY) = 15 
fstat(15, {st_mode=S_IFREG|0644, st_size=901, ...}) = 0 
mmap(NULL, 4096, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0)
= 0x2aaaaccb3000 
read(15, "# /etc/hosts.deny: list of hosts"..., 4096) = 901 
read(15, "", 4096) = 0 
close(15) = 0 
munmap(0x2aaaaccb3000, 4096) = 0 
fcntl(12, F_GETFL) = 0x2 (flags O_RDWR) 
fcntl(12, F_SETFL, O_RDWR|O_NONBLOCK) = 0 
stat("/etc/localtime", {st_mode=S_IFREG|0644, st_size=2945, ...}) = 0 
stat("/etc/localtime", {st_mode=S_IFREG|0644, st_size=2945, ...}) = 0 
stat("/etc/localtime", {st_mode=S_IFREG|0644, st_size=2945, ...}) = 0 
sendto(3, "<167>Jul 1 09:19:21 slapd[2765]"..., 102, MSG_NOSIGNAL, NULL,
0) = 102 
select(16, [4 6 7 12], NULL, NULL, {15, 0}) = 1 (in [12], left {15, 0}) 
read(12, "0b\2\1\1`]\2", 8) = 8
[ some uninteresting ldap stuff ]
futex(0x2b0db3b35dc8, FUTEX_WAKE, 1) = 1 
select(16, [4 6 7 12], NULL, NULL, {15, 0}) = 0 (Timeout) 
select(16, [4 6 7 12], NULL, NULL, {15, 0}) = 0 (Timeout) 
select(16, [4 6 7 12], NULL, NULL, {15, 0}) = 0 (Timeout) 
select(16, [4 6 7 12], NULL, NULL, {15, 0}) = 0 (Timeout) 
select(16, [4 6 7 12], NULL, NULL, {15, 0}) = 0 (Timeout) 
select(16, [4 6 7 12], NULL, NULL, {15, 0}) = 0 (Timeout) 
write(5, "0", 1) = 1 
shutdown(12, 2 /* send and receive */) = 0 
close(12) = 0
Here, we can see 5 select system call for a real idletimeout is 75
Here we have 6 select system calls for a real idletimeout of 90 seconds
which is enough for the session to expire on the load balancer.
I have checked the source code and the logic that choose either to
idletimeout the session or go into a "SLAP_EVENT_WAIT" (select) call is
the following :
from server/slap/daemon.c
now = slap_get_time();
if ( ( global_idletimeout > 0 ) &&
        difftime( last_idle_check +
        global_idletimeout/SLAPD_IDLE_CHECK_LIMIT, now ) < 0 )
        {
        	connections_timeout_idle( now );
        	last_idle_check = now;
        }
As I understand this, no connection should be tested against the
idletimeout before any "event wait loop" takes more time than the
idletimeout parameter / 4.
In my case, I need the "event wait loop" to last more than 15 seconds
for connections to be checked against aging.
If I am not mistaken, as the difftime call compares seconds, I need the
loop to last a least for 16 seconds for the connections_timeout_idle
procedure to be called.
Am I understanding everything the right way ?
If it is the case, shouldn't the difftime call be tested <= 0 to help
idle sessions to be cleaned sonner ?
Many thanks.
-- 
Eric Déchaux

Ingénieur Kébabiste

Sun Microsystems Services France