Strange hang scenario, resumes after idletimeout, but plenty of FDs available

1 Jun 2011


      I'm running into the following scenario. Shortly after slapd gets 
bombarded by a burst of operations (from several different clients) on 
existing connections (well under the max number of connections, about 
3000 out of 16384), it suddenly hangs. It's not responsive to any new 
connections, and doesn't process operations on existing connections. 
Load average is near zero during this time, so it's not doing anything. 
After 20 minutes (idletimeout), slapd frees several connections (maybe 
say 1000), and resumes working again as if nothing happened.
The load pattern that gets it into this state happens every hour, almost 
on the hour (most likely associated with nslcd and cron jobs, which 
we're looking to mitigate elsewise). Another strange thing is that slapd 
will survive one instance's worth of bombardment without hanging, but 
the *next* hour will go into a hang state.
Are there any resources other than file descriptors that are freed up 
during the idletimeout processing? Are there any other parameters that 
can be tuned besides idletimeout here? Could it possibly be a case of 
deadlock somewhere, something grabbing all the locks? Would things like 
set_lk_max_locks be relevant to investigate here? Any log level settings 
that might reveal more of what's happening here?
Thanks for any suggestions on things to look at and try.
-Kartik

2025

2024

2023

2022

2021

2020

2019

2018

2017

2016

2015

2014

2013

2012

2011

2010

2009

2008

2007

Strange hang scenario, resumes after idletimeout, but plenty of FDs available