RE: Help with server hang

13 Mar 2008


      Looks like the problem was a corrupted database, at least the problem
wasn't reproducible on a know working database.  BTW, I am using BDB
4.6.21. 
Thanks for the help.
Roy
-----Original Message-----
From: Aaron Richton [mailto:richton@nbcs.rutgers.edu] 
Sent: Tuesday, March 11, 2008 6:39 PM
To: Marantz, Roy
Cc: openldap-software@openldap.org
Subject: Re: Help with server hang
On Tue, 11 Mar 2008, Marantz, Roy wrote:
...
I'm running OpenLDAP 2.4.8 with Berkeley DB 2.4.6 on Solaris-10
compiled
...
with Sun's compiler.
Are you sure you haven't typoed the bdb version number here? I mean, the
most significant digit should be a "4", if nothing else...
Look for the slapd log message from bdb_open (-d trace) to find your 
Sleepycat version.
...
If I start and stop slapd in short succession it will hang after the
2nd
...
or 3rd time.  Following  are the syslog message, a pstack (backtrace),
and sanitized copy of
slapd.conf.  Any help in debugging this would be appreciated.
Mar 11 16:00:33 master.nyc.deshaw.com slapd[3995]: [ID 100111
local4.debug] slapd starting
Mar 11 16:00:39 master.nyc.deshaw.com slapd[3995]: [ID 543694
local4.debug] daemon: shutdown requested and initiated.
Mar 11 16:00:39 master.nyc.deshaw.com slapd[3995]: [ID 542995
local4.debug] slapd shutdown: waiting for 0 threads to terminate
Your point is that you're hanging on a shutdown that you initiated,
right? 
Or is it that slapd refuses to fully start and/or is exiting on its own 
volition shortly after startup or.....?
...
13476:  /usr/local/openldap/libexec/slapd -u ldap -g ldap
-----------------  lwp# 1 / thread# 1  --------------------
fe001117 lwp_wait (2, 80476f8)
fdffd326 _thrp_join (2, 0, 0, 1) + 5a
fdffd4a5 pthread_join (2, 0, 80871e0, 0) + 2b
08088b02 slapd_daemon () + 7a
-----------------  lwp# 2 / thread# 2  --------------------
fe00040b lwp_park (0, 0, 0)
fdffac7a cond_wait_queue (83114ac, 8311494, 0, 0) + 68
fdffb146 _cond_wait (83114ac, 8311494) + 66
fdffb188 cond_wait (83114ac, 8311494) + 21
fdffb1c1 pthread_cond_wait (83114ac, 8311494, a7, 821603c) + 1b
08196c6c ldap_pvt_thread_pool_destroy (fe000542, 0, 83114cc, fdee8ec0,
fdc70000, 0) + e4
083114c8 ???????? ()
Well, easy enough, figure out lock it's waiting for ;)
I think the "shutdown you asked for" interpretation is right, so under 
that theory:
Seeing as you're at shutdown, my guess would be it's trying to close the
bdb database in
...
directory       /var/openldap/nyc.example.com/data
so go there, run db_stat -CA, and see what locks are still held.
Assuming 
that you've initiated a shutdown, there really shouldn't be anything 
left...although there might be more going on. For example, syncrepl
might 
still be going even if you haven't done anything "by hand." (Although
your 
stack trace and syslogs belie that...)
But hey, why guess...turn up debugging, do you see the database being 
closed right?

2025

2024

2023

2022

2021

2020

2019

2018

2017

2016

2015

2014

2013

2012

2011

2010

2009

2008

2007

2006

RE: Help with server hang