Dear openldap gurus,
I am hitting some strange behavior with the idle sessions timeout feature. In my configuration this timeout is set to 60 seconds on 4 slaves that are behind a load balancer. This load balancer times-out idle sessions after 90 seconds, which should be fine. Openldap version is the stable one from Debian Etch r3.
I however encounter random connection issues that have been traced to the load balancer timeouting and idle session *before* the ldap slave.
I have straced the slapd process and I found out the applyed idletimeout was way above the configured one, please check the two following strace output :
Output 1
futex(0x603428, FUTEX_WAKE, 1) = 1 select(16, [4 6 7], NULL, NULL, {300, 0}) = 1 (in [6], left {238, 132000}) accept(6, {sa_family=AF_INET, sin_port=htons(34103), sin_addr=inet_addr("192.168.1.1")}, [4647729933731233808]) = 12 setsockopt(12, SOL_SOCKET, SO_KEEPALIVE, [1], 4) = 0 setsockopt(12, SOL_TCP, TCP_NODELAY, [1], 4) = 0 open("/etc/hosts.allow", O_RDONLY) = 15 fstat(15, {st_mode=S_IFREG|0644, st_size=677, ...}) = 0 mmap(NULL, 4096, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0) = 0x2aaaaccb3000 read(15, "# /etc/hosts.allow: list of host"..., 4096) = 677 read(15, "", 4096) = 0 close(15) = 0 munmap(0x2aaaaccb3000, 4096) = 0 open("/etc/hosts.deny", O_RDONLY) = 15 fstat(15, {st_mode=S_IFREG|0644, st_size=901, ...}) = 0 mmap(NULL, 4096, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0) = 0x2aaaaccb3000 read(15, "# /etc/hosts.deny: list of hosts"..., 4096) = 901 read(15, "", 4096) = 0 close(15) = 0 munmap(0x2aaaaccb3000, 4096) = 0 fcntl(12, F_GETFL) = 0x2 (flags O_RDWR) fcntl(12, F_SETFL, O_RDWR|O_NONBLOCK) = 0 stat("/etc/localtime", {st_mode=S_IFREG|0644, st_size=2945, ...}) = 0 stat("/etc/localtime", {st_mode=S_IFREG|0644, st_size=2945, ...}) = 0 stat("/etc/localtime", {st_mode=S_IFREG|0644, st_size=2945, ...}) = 0 sendto(3, "<167>Jul 1 09:14:14 slapd[2765]"..., 102, MSG_NOSIGNAL, NULL, 0) = 102 select(16, [4 6 7 12], NULL, NULL, {15, 0}) = 1 (in [12], left {15, 0}) read(12, "0b\2\1\1`]\2", 8) = 8
[ some uninteresting ldap stuff ]
futex(0x603428, FUTEX_WAKE, 1) = 1 read(12, 0x6f30ff, 8) = -1 EAGAIN (Resource temporarily unavailable) futex(0x2b0db3b35dc8, FUTEX_WAKE, 1) = 1 select(16, [4 6 7 12], NULL, NULL, {15, 0}) = 0 (Timeout) select(16, [4 6 7 12], NULL, NULL, {15, 0}) = 0 (Timeout) select(16, [4 6 7 12], NULL, NULL, {15, 0}) = 0 (Timeout) select(16, [4 6 7 12], NULL, NULL, {15, 0}) = 0 (Timeout) select(16, [4 6 7 12], NULL, NULL, {15, 0}) = 0 (Timeout) write(5, "0", 1) = 1 shutdown(12, 2 /* send and receive */) = 0 close(12) = 0
Here, we can see 5 select system calls for a real idletimeout is 75 seconds instead of 60.
Output 2
futex(0x603428, FUTEX_WAKE, 1) = 1 select(16, [4 6 7], NULL, NULL, {300, 0}) = 1 (in [6], left {230, 828000}) accept(6, {sa_family=AF_INET, sin_port=htons(51692), sin_addr=inet_addr("192.168.1.1")}, [4647729933731233808]) = 12 setsockopt(12, SOL_SOCKET, SO_KEEPALIVE, [1], 4) = 0 setsockopt(12, SOL_TCP, TCP_NODELAY, [1], 4) = 0 open("/etc/hosts.allow", O_RDONLY) = 15 fstat(15, {st_mode=S_IFREG|0644, st_size=677, ...}) = 0 mmap(NULL, 4096, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0) = 0x2aaaaccb3000 read(15, "# /etc/hosts.allow: list of host"..., 4096) = 677 read(15, "", 4096) = 0 close(15) = 0 munmap(0x2aaaaccb3000, 4096) = 0 open("/etc/hosts.deny", O_RDONLY) = 15 fstat(15, {st_mode=S_IFREG|0644, st_size=901, ...}) = 0 mmap(NULL, 4096, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0) = 0x2aaaaccb3000 read(15, "# /etc/hosts.deny: list of hosts"..., 4096) = 901 read(15, "", 4096) = 0 close(15) = 0 munmap(0x2aaaaccb3000, 4096) = 0 fcntl(12, F_GETFL) = 0x2 (flags O_RDWR) fcntl(12, F_SETFL, O_RDWR|O_NONBLOCK) = 0 stat("/etc/localtime", {st_mode=S_IFREG|0644, st_size=2945, ...}) = 0 stat("/etc/localtime", {st_mode=S_IFREG|0644, st_size=2945, ...}) = 0 stat("/etc/localtime", {st_mode=S_IFREG|0644, st_size=2945, ...}) = 0 sendto(3, "<167>Jul 1 09:19:21 slapd[2765]"..., 102, MSG_NOSIGNAL, NULL, 0) = 102 select(16, [4 6 7 12], NULL, NULL, {15, 0}) = 1 (in [12], left {15, 0}) read(12, "0b\2\1\1`]\2", 8) = 8
[ some uninteresting ldap stuff ]
futex(0x2b0db3b35dc8, FUTEX_WAKE, 1) = 1 select(16, [4 6 7 12], NULL, NULL, {15, 0}) = 0 (Timeout) select(16, [4 6 7 12], NULL, NULL, {15, 0}) = 0 (Timeout) select(16, [4 6 7 12], NULL, NULL, {15, 0}) = 0 (Timeout) select(16, [4 6 7 12], NULL, NULL, {15, 0}) = 0 (Timeout) select(16, [4 6 7 12], NULL, NULL, {15, 0}) = 0 (Timeout) select(16, [4 6 7 12], NULL, NULL, {15, 0}) = 0 (Timeout) write(5, "0", 1) = 1 shutdown(12, 2 /* send and receive */) = 0 close(12) = 0
Here, we can see 5 select system call for a real idletimeout is 75 Here we have 6 select system calls for a real idletimeout of 90 seconds which is enough for the session to expire on the load balancer.
I have checked the source code and the logic that choose either to idletimeout the session or go into a "SLAP_EVENT_WAIT" (select) call is the following :
from server/slap/daemon.c
now = slap_get_time();
if ( ( global_idletimeout > 0 ) && difftime( last_idle_check + global_idletimeout/SLAPD_IDLE_CHECK_LIMIT, now ) < 0 ) { connections_timeout_idle( now ); last_idle_check = now; }
As I understand this, no connection should be tested against the idletimeout before any "event wait loop" takes more time than the idletimeout parameter / 4.
In my case, I need the "event wait loop" to last more than 15 seconds for connections to be checked against aging.
If I am not mistaken, as the difftime call compares seconds, I need the loop to last a least for 16 seconds for the connections_timeout_idle procedure to be called.
Am I understanding everything the right way ? If it is the case, shouldn't the difftime call be tested <= 0 to help idle sessions to be cleaned sonner ?
Many thanks.