quanah@zimbra.com wrote:
--On Tuesday, October 02, 2007 3:28 AM +0000 quanah@zimbra.com wrote:
--On Tuesday, October 02, 2007 2:35 AM +0000 hyc@symas.com wrote:
quanah@zimbra.com wrote:
--On October 1, 2007 11:22:11 PM +0000 quanah@zimbra.com wrote:
The following files will be uploaded to the ftp site, where # will be the assigned ITS number.
URL's specifically are:
ftp://ftp.openldap.org/incoming/5161-pstak.out.2007-10-01 ftp://ftp.openldap.org/incoming/5161-dbstat.delta.out.2007-10-01 ftp://ftp.openldap.org/incoming/5161-db_stat.out.2007-10-01
The pstack output is a bit odd, is this a regular debug build? With frame pointers, etc? Can you get a stack trace in gdb?
It is a regular build, and they killed and restarted it before getting any gdb information. We've asked them to please get the gdb information in the future. Since it has happened twice now for thi particular group in about a month, I'm hopeful it'll happen again before too long. ;)
And here is the last logged operation:
Oct 1 17:48:21 ldap01 slapd.bin[16121]: conn=62333 op=1 MOD dn="uid=XXXXXXX,ou=people,dc=YYYYYY,dc=com" Oct 1 17:48:21 ldap01 slapd.bin[16121]: conn=62333 op=1 MOD attr=zimbraLastLogonTimestamp
Based on the (unreliable) pstack output it appears that all of the threads are waiting for the same mutex. This of course shouldn't be possible since one of those threads must already own it. We really need to have gdb access here to inspect the state of the mutex and see which thread is the owner, then figure out why it's trying to lock it again. In OpenLDAP 2.3 this pretty much means that some operation locked the mutex and somehow completed without unlocking it, i.e. completed without going thru the accesslog response callback.
This has nothing to do with BDB so db_stat isn't relevant here. It's about the accesslog overlay and any other overlays that may be manipulating the callback stack, so your slapd.conf is more relevant here.