On Feb 6, 2007, at 1:39 PM, mikee wrote:
On Tue, 06 Feb 2007, matthew sporleder might have said:
On 2/6/07, daniel@ncsu.edu daniel@ncsu.edu wrote:
Hi folk,
I want to start this message by saying, what I'm about to describe is completely vague and I don't expect to get a solution response. ;) Basically, I'm out of ideas and am looking for some suggestions as to how to debug the issue I'm running into.
Starting about half a year ago, slapd started "just dieing" out of the blue. Not a think in the logs shows up to indicate what might have caused it. The last query that I see in the logs before a crash always seems to be nothing special. I don't even see a core dump being generated yet, but then that may just be because I don't have the proper setup to get a core dump at this time. We were running the last 2.2 and upgraded to the latest release of 2.3 to make sure it wasn't an "old version" issue. Unfortunately, slapd still dies a fair amount on us. It appears to be fairly unpredictable. I've seen it crash within 1 minute of starting up slapd (then a subsequent startup 'takes' just fine). I've seen it crash when there were a number of network issues going on. I've seen it crash out of the blue when nothing appeared to be going on. I don't really have the drive space to turn on max debug logging 24/7 until the problem occurs.
We're thinking about setting up something to watch all of the network traffic going to one of the boxes until it dies. (assuming we can find something with the resources to do that)
That all said... since I have nothing solid to present, do you all have any suggestions of what would be the best way to track down what's going on? I'm literally out of ideas unless my berkeley db config is somehow causing the problem or something like that.
I apologize for the vagueness. =/ Any ideas/suggestions?
After the crash, is your bdb environment clean, or is it needing a db_recover? Depending on your OS, you could watch the pid all the time and trap the last signals received, last files accessed, etc, and that wouldn't take tons of resources.
You could try turning on max debugging and simply rotate a lot more often. (every n minutes or even seconds) This way you could definitely keep the -last- transactions and just not worry about the old ones.
What about running a continous 'strace -p 99999' of slapd and wait for it to die again. The strace window should show the last call to the kernel.
Hrm. Good point, I could probably run that inside of screen or something. I'll give that a whirl, thanks!
Daniel
Mike
Daniel Henninger wrote:
On Feb 6, 2007, at 1:39 PM, mikee wrote:
What about running a continous 'strace -p 99999' of slapd and wait for it to die again. The strace window should show the last call to the kernel.
Hrm. Good point, I could probably run that inside of screen or something. I'll give that a whirl, thanks!
Just use gdb and attach to the slapd process once it's started up, then let it run.
openldap-software@openldap.org