Re: (ITS#5926) slapd proxying AD with back-meta locks up - openldap-bugs

3 Mar 2009


      mhardin@symas.com wrote:
...
Full_Name: Matthew Hardin
Version: 2.4.12
OS: Red Hat Enterprise Linux 4 i686
URL: ftp://ftp.openldap.org/incoming/
Submission from: (NULL) (74.38.114.185)
Hi All,
We are using a pair of OpenLDAP 2.4.12 servers with back-meta to proxy an active
directory domain. The clients are all current versions of PADL's nss_ldap
libraries.
Every once in a while (sometimes twice a day, sometimes once every two weeks)
one of the slapd servers will peg CPU use at 100% and stop answering requests.
The only way to stop slapd is with a kill -9.
There doesn't seem to be anything to explain the lockup or allow us to reproduce
it. We are using redundant AD servers and they are not going offline. A third
slapd server running as a test server using the same AD servers and configured
identically but serving a much lighter nss_ldap load does not fail at all. We
have ruled out hardware, OS, and connectivity as possible causes.
We are unfortunately unable to attach gdb to the running processes, as these are
production servers and need to be restarted immediately. Our smaller test system
does not exhibit the same behavior, either. There is nothing unusual in the
server logs, either. We do have core files generated from kill -6 commands, and
they are all eerily similar to the back-trace below in that they have one or
more threads waiting for a search or a bind response from AD.
I am also enclosing relevant portions of slapd.conf for these systems. Please
let me know if any additional information would be useful.
Thanks,
-Matt

(gdb) thr apply all bt
...
Thread 1 (process 29769):
#0  0x005fa410 in __kernel_vsyscall ()
#1  0x004ddd10 in raise () from /lib/libc.so.6
#2  0x004df621 in abort () from /lib/libc.so.6
#3  0x004d715b in __assert_fail () from /lib/libc.so.6
#4  0x0806eec8 in slap_listener (sl=0x9583108)
     at /home/build/sol-2_4_12-1-nonopt/sol24/ldap24/servers/slapd/daemon.c:1803
#5  0x0806f643 in slap_listener_thread (ctx=0x4e92220, ptr=0x9583108)
     at /home/build/sol-2_4_12-1-nonopt/sol24/ldap24/servers/slapd/daemon.c:1997
#6  0x00a10783 in ldap_int_thread_pool_wrapper (xpool=0x959a010)
     at /home/build/sol-2_4_12-1-nonopt/sol24/ldap24/libraries/libldap_r/tpool.c:663
#7  0x0038a45b in start_thread () from /lib/libpthread.so.0
#8  0x00585c4e in clone () from /lib/libc.so.6
(gdb)
It seems you sent the wrong backtrace; this one doesn't show any signs of 
looping or anything that would indicate heavy CPU usage. It shows an assert 
which would kill the process, leading to 0% CPU usage. This assert was most 
likely fixed in 2.4.14.
...
slapd.conf
...
#######################################################################
# bdb database definitions
#######################################################################
database        bdb
suffix          "ou=nisdata"
...
#######################################################################
# Definitions for proxy and cache to AD
#######################################################################
database        meta
suffix          "dc=my-customer,dc=com"
...
# The link to AD:
uri             ldaps://ldap-prd-dc01.my-customer.com/dc=ad,dc=my-customer,dc=com
ldaps://ldap-prd-dc02.my-customer.com/
...
# The link to the NIS data directory (yes, we could chain/glue, that's
# for later)
uri             ldapi://%2fvar%2fsymas%2frun%2fldapi/dc=nis,dc=my-customer,dc=com
Pointing back-meta at its own slapd will inevitably exhaust the thread pool 
since incoming operations will always use 2x the number of available threads.
This ITS will be closed.
-- 
   -- Howard Chu
   CTO, Symas Corp.           http://www.symas.com
   Director, Highland Sun     http://highlandsun.com/hyc/
   Chief Architect, OpenLDAP  http://www.openldap.org/project/