mhardin@symas.com wrote:
Full_Name: Matthew Hardin Version: 2.4.12 OS: Red Hat Enterprise Linux 4 i686 URL: ftp://ftp.openldap.org/incoming/ Submission from: (NULL) (74.38.114.185)
Hi All,
We are using a pair of OpenLDAP 2.4.12 servers with back-meta to proxy an active directory domain. The clients are all current versions of PADL's nss_ldap libraries.
Every once in a while (sometimes twice a day, sometimes once every two weeks) one of the slapd servers will peg CPU use at 100% and stop answering requests. The only way to stop slapd is with a kill -9.
There doesn't seem to be anything to explain the lockup or allow us to reproduce it. We are using redundant AD servers and they are not going offline. A third slapd server running as a test server using the same AD servers and configured identically but serving a much lighter nss_ldap load does not fail at all. We have ruled out hardware, OS, and connectivity as possible causes.
We are unfortunately unable to attach gdb to the running processes, as these are production servers and need to be restarted immediately. Our smaller test system does not exhibit the same behavior, either. There is nothing unusual in the server logs, either. We do have core files generated from kill -6 commands, and they are all eerily similar to the back-trace below in that they have one or more threads waiting for a search or a bind response from AD.
I am also enclosing relevant portions of slapd.conf for these systems. Please let me know if any additional information would be useful.
Thanks,
-Matt
(gdb) thr apply all bt
Thread 1 (process 29769): #0 0x005fa410 in __kernel_vsyscall () #1 0x004ddd10 in raise () from /lib/libc.so.6 #2 0x004df621 in abort () from /lib/libc.so.6 #3 0x004d715b in __assert_fail () from /lib/libc.so.6 #4 0x0806eec8 in slap_listener (sl=0x9583108) at /home/build/sol-2_4_12-1-nonopt/sol24/ldap24/servers/slapd/daemon.c:1803 #5 0x0806f643 in slap_listener_thread (ctx=0x4e92220, ptr=0x9583108) at /home/build/sol-2_4_12-1-nonopt/sol24/ldap24/servers/slapd/daemon.c:1997 #6 0x00a10783 in ldap_int_thread_pool_wrapper (xpool=0x959a010) at /home/build/sol-2_4_12-1-nonopt/sol24/ldap24/libraries/libldap_r/tpool.c:663 #7 0x0038a45b in start_thread () from /lib/libpthread.so.0 #8 0x00585c4e in clone () from /lib/libc.so.6 (gdb)
It seems you sent the wrong backtrace; this one doesn't show any signs of looping or anything that would indicate heavy CPU usage. It shows an assert which would kill the process, leading to 0% CPU usage. This assert was most likely fixed in 2.4.14.
slapd.conf
####################################################################### # bdb database definitions ####################################################################### database bdb suffix "ou=nisdata"
####################################################################### # Definitions for proxy and cache to AD ####################################################################### database meta suffix "dc=my-customer,dc=com"
# The link to AD: uri ldaps://ldap-prd-dc01.my-customer.com/dc=ad,dc=my-customer,dc=com ldaps://ldap-prd-dc02.my-customer.com/
# The link to the NIS data directory (yes, we could chain/glue, that's # for later) uri ldapi://%2fvar%2fsymas%2frun%2fldapi/dc=nis,dc=my-customer,dc=com
Pointing back-meta at its own slapd will inevitably exhaust the thread pool since incoming operations will always use 2x the number of available threads.
This ITS will be closed.