Eric Déchaux wrote:
Dear openldap gurus,
I am hitting some strange behavior with the idle sessions timeout feature. In my configuration this timeout is set to 60 seconds on 4 slaves that are behind a load balancer. This load balancer times-out idle sessions after 90 seconds, which should be fine. Openldap version is the stable one from Debian Etch r3.
I have no idea what Debian or any other distro packages. You should quote specific version numbers for all relevant pieces of software.
I however encounter random connection issues that have been traced to the load balancer timeouting and idle session *before* the ldap slave.
I have straced the slapd process and I found out the applyed idletimeout was way above the configured one, please check the two following strace output :
Output 1
[ some uninteresting ldap stuff ]
futex(0x603428, FUTEX_WAKE, 1) = 1 read(12, 0x6f30ff, 8) = -1 EAGAIN (Resource temporarily unavailable) futex(0x2b0db3b35dc8, FUTEX_WAKE, 1) = 1 select(16, [4 6 7 12], NULL, NULL, {15, 0}) = 0 (Timeout) select(16, [4 6 7 12], NULL, NULL, {15, 0}) = 0 (Timeout) select(16, [4 6 7 12], NULL, NULL, {15, 0}) = 0 (Timeout) select(16, [4 6 7 12], NULL, NULL, {15, 0}) = 0 (Timeout) select(16, [4 6 7 12], NULL, NULL, {15, 0}) = 0 (Timeout) write(5, "0", 1) = 1 shutdown(12, 2 /* send and receive */) = 0 close(12) = 0
Here, we can see 5 select system calls for a real idletimeout is 75 seconds instead of 60.
This doesn't really surprise me.
Output 2
[ some uninteresting ldap stuff ]
futex(0x2b0db3b35dc8, FUTEX_WAKE, 1) = 1 select(16, [4 6 7 12], NULL, NULL, {15, 0}) = 0 (Timeout) select(16, [4 6 7 12], NULL, NULL, {15, 0}) = 0 (Timeout) select(16, [4 6 7 12], NULL, NULL, {15, 0}) = 0 (Timeout) select(16, [4 6 7 12], NULL, NULL, {15, 0}) = 0 (Timeout) select(16, [4 6 7 12], NULL, NULL, {15, 0}) = 0 (Timeout) select(16, [4 6 7 12], NULL, NULL, {15, 0}) = 0 (Timeout) write(5, "0", 1) = 1 shutdown(12, 2 /* send and receive */) = 0 close(12) = 0
Here we have 6 select system calls for a real idletimeout of 90 seconds which is enough for the session to expire on the load balancer.
This is rather surprising.
I have checked the source code and the logic that choose either to idletimeout the session or go into a "SLAP_EVENT_WAIT" (select) call is the following :
from server/slap/daemon.c
now = slap_get_time(); if ( ( global_idletimeout> 0 )&& difftime( last_idle_check + global_idletimeout/SLAPD_IDLE_CHECK_LIMIT, now )< 0 ) { connections_timeout_idle( now ); last_idle_check = now; }
As I understand this, no connection should be tested against the idletimeout before any "event wait loop" takes more time than the idletimeout parameter / 4.
Right, on an otherwise idle server, we don't want to wake up too frequently to check for idle connections. It's OK to check a little late, but we don't want to wake up much too late, which would often occur if the IDLE_CHECK_LIMIT was smaller.
In my case, I need the "event wait loop" to last more than 15 seconds for connections to be checked against aging.
Basically, yes.
If I am not mistaken, as the difftime call compares seconds, I need the loop to last a least for 16 seconds for the connections_timeout_idle procedure to be called.
Am I understanding everything the right way ?
Sounds like it.
If it is the case, shouldn't the difftime call be tested<= 0 to help idle sessions to be cleaned sonner ?
I don't think it makes much difference in the long run. Whenever you choose an idletimeout that is not evenly divisible by 4 (IDLE_CHECK_LIMIT) it's going to have extra slop anyway. And none of this explains how your 60 second idletimeout allowed an idle connection to continue for 90 seconds. Frankly I have no idea why that would be.
In the meantime, on an idle server, I don't see any urgency in closing idle connections, because in this case there's no danger of resource starvation. On the other hand, for an active server, the event loop is going to be waking up more frequently anyway due to real activity, in which case the idle checks will happen more frequently. So as the server gets busier, the actual idletimeouts will get much closer to the configured value.