Hi all,
Having just upgraded our internal LDAP server from Debian Lenny (2.4.16 internal build) to Debian Squeeze (2.4.23), we have started to see instances where the slapd process hangs and stops responding to all requests until we kill -9 and restart the process.
Bizarrely enough, we can reproduce this pretty much every time when we try and create a new LDAP group using the GOsa web administration tool. Is this a known issue at all? Next time it happens, I'm happy to post a backtrace if you let me know what output you need from gdb to debug this.
Many thanks,
Mark.
Mark Cave-Ayland wrote:
Hi all,
Having just upgraded our internal LDAP server from Debian Lenny (2.4.16 internal build) to Debian Squeeze (2.4.23), we have started to see instances where the slapd process hangs and stops responding to all requests until we kill -9 and restart the process.
Bizarrely enough, we can reproduce this pretty much every time when we try and create a new LDAP group using the GOsa web administration tool. Is this a known issue at all? Next time it happens, I'm happy to post a backtrace if you let me know what output you need from gdb to debug this.
It would be more useful if you can reproduce this on 2.4.24.
--On Thursday, March 17, 2011 8:27 AM -0700 Howard Chu hyc@symas.com wrote:
Mark Cave-Ayland wrote:
Hi all,
Having just upgraded our internal LDAP server from Debian Lenny (2.4.16 internal build) to Debian Squeeze (2.4.23), we have started to see instances where the slapd process hangs and stops responding to all requests until we kill -9 and restart the process.
Bizarrely enough, we can reproduce this pretty much every time when we try and create a new LDAP group using the GOsa web administration tool. Is this a known issue at all? Next time it happens, I'm happy to post a backtrace if you let me know what output you need from gdb to debug this.
It would be more useful if you can reproduce this on 2.4.24.
Debian's squeeze build of OpenLDAP also contains a patch known to corrupt the database. The first thing you want to do is abandon their build.
--Quanah
--
Quanah Gibson-Mount Sr. Member of Technical Staff Zimbra, Inc A Division of VMware, Inc. -------------------- Zimbra :: the leader in open source messaging and collaboration
On 17/03/11 16:10, Quanah Gibson-Mount wrote:
Debian's squeeze build of OpenLDAP also contains a patch known to corrupt the database. The first thing you want to do is abandon their build.
Really? Ugh. Thanks for the heads up - has anyone reported this upstream to Debian yet?
ATB,
Mark.
--On Thursday, March 17, 2011 4:32 PM +0000 Mark Cave-Ayland mark.cave-ayland@siriusit.co.uk wrote:
On 17/03/11 16:10, Quanah Gibson-Mount wrote:
Debian's squeeze build of OpenLDAP also contains a patch known to corrupt the database. The first thing you want to do is abandon their build.
Really? Ugh. Thanks for the heads up - has anyone reported this upstream to Debian yet?
Yes, I reported it to them on 2/28. Canonical fixed the build within 4 hours. Debian's been absolutely silent.
--Quanah
--
Quanah Gibson-Mount Sr. Member of Technical Staff Zimbra, Inc A Division of VMware, Inc. -------------------- Zimbra :: the leader in open source messaging and collaboration
On 17/03/11 15:27, Howard Chu wrote:
Bizarrely enough, we can reproduce this pretty much every time when we try and create a new LDAP group using the GOsa web administration tool. Is this a known issue at all? Next time it happens, I'm happy to post a backtrace if you let me know what output you need from gdb to debug this.
It would be more useful if you can reproduce this on 2.4.24.
Okay. In the meantime, I've just setup a development environment for testing and got the following backtrace from the hung process using gdb:
(gdb) bt full #0 0x00007fa50aca8be5 in pthread_join (threadid=140346751547136, thread_return=0x0) at pthread_join.c:89 __ignore = <value optimized out> _tid = 10340 _buffer = {__routine = 0x7fa50aca8ab0 <cleanup>, __arg = 0x7fa506457d28, __canceltype = 105216464, __prev = 0x0} oldtype = 0 result = <value optimized out> #1 0x000000000042d72c in slapd_daemon () at /home/devel/openldap/trunk/servers/slapd/daemon.c:2842 listener_tid = 140346751547136 rc = 0 #2 0x000000000041ae6a in main (argc=9, argv=0x7fffd2f2e5b0) at /home/devel/openldap/trunk/servers/slapd/main.c:961 i = 9 no_detach = 0 rc = -12 urls = 0x7df0c0 "ldap:/// ldapi:///" username = 0x7df100 "root" groupname = 0x7df0e0 "ldap" sandbox = 0x0 syslogUser = 160 configfile = 0x7df120 "/etc/ldap/slapd.conf" configdir = 0x0 serverName = <value optimized out> scp = <value optimized out> scp_entry = <value optimized out> debug_unknowns = 0x0 syslog_unknowns = 0x0 slapd_pid_file_unlink = 1 slapd_args_file_unlink = 1 firstopt = <value optimized out> __PRETTY_FUNCTION__ = "main" (gdb)
Maybe not entirely helpful, but now the test environment is set up, I'll have a go with a source build of 2.4.24 with full debug enabled and see if it is still reproducible there.
ATB,
Mark.
On 17/03/11 15:27, Howard Chu wrote:
It would be more useful if you can reproduce this on 2.4.24.
Okay - I've just completed two builds from vanilla source, one for 2.4.23 and another for 2.4.24. Under 2.4.23, I see exactly the same crash in pthread_join() and I have to kill -9 the slapd process. Fortunately the 2.4.24 build seems to work fine and doesn't exhibit the problem.
Based upon the fact it seems like a pthread/locking issue, do you have an ITS reference I can chase with upstream Debian? This is an absolute showstopper IMO as we're seeing multiple hard crashes a day even on our local, minimally loaded LDAP server running 2.4.23.
ATB,
Mark.
openldap-technical@openldap.org