New subject: slapd crashing "randomly?"

6 Feb 2007


      On Feb 6, 2007, at 1:39 PM, mikee wrote:
...
On Tue, 06 Feb 2007, matthew sporleder might have said:
...
On 2/6/07, daniel@ncsu.edu daniel@ncsu.edu wrote:
...
Hi folk,
I want to start this message by saying, what I'm about to  
describe is
completely vague and I don't expect to get a solution response.  ;)
Basically, I'm out of ideas and am looking for some suggestions  
as to how
to debug the issue I'm running into.
Starting about half a year ago, slapd started "just dieing" out  
of the
blue.  Not a think in the logs shows up to indicate what might  
have caused
it.  The last query that I see in the logs before a crash always  
seems to
be nothing special.  I don't even see a core dump being generated  
yet, but
then that may just be because I don't have the proper setup to  
get a core
dump at this time.  We were running the last 2.2 and upgraded to the
latest release of 2.3 to make sure it wasn't an "old version" issue.
Unfortunately, slapd still dies a fair amount on us.  It appears  
to be
fairly unpredictable.  I've seen it crash within 1 minute of  
starting up
slapd (then a subsequent startup 'takes' just fine).  I've seen  
it crash
when there were a number of network issues going on.  I've seen  
it crash
out of the blue when nothing appeared to be going on.  I don't  
really have
the drive space to turn on max debug logging 24/7 until the problem
occurs.
We're thinking about setting up something to watch all of the  
network
traffic going to one of the boxes until it dies.  (assuming we  
can find
something with the resources to do that)
That all said...  since I have nothing solid to present, do you  
all have
any suggestions of what would be the best way to track down  
what's going
on?  I'm literally out of ideas unless my berkeley db config is  
somehow
causing the problem or something like that.
I apologize for the vagueness.  =/  Any ideas/suggestions?
After the crash, is your bdb environment clean, or is it needing a
db_recover?
Depending on your OS, you could watch the pid all the time and trap
the last signals received, last files accessed, etc, and that  
wouldn't
take tons of resources.
You could try turning on max debugging and simply rotate a lot more
often.  (every n minutes or even seconds)  This way you could
definitely keep the -last- transactions and just not worry about the
old ones.
What about running a continous 'strace -p 99999' of slapd and wait for
it to die again. The strace window should show the last call to the  
kernel.
Hrm.  Good point, I could probably run that inside of screen or  
something.  I'll give that a whirl, thanks!
Daniel
...
Mike

Re: slapd crashing "randomly?"