Re: (ITS#6853) slapadd/slapindex -q hang - openldap-bugs

4 Mar 2011


      --On Friday, March 04, 2011 9:08 PM +0000 hyc@symas.com wrote:
...
dhawes@vt.edu wrote:
...
On 03/03/2011 02:38 PM, Quanah Gibson-Mount wrote:
...
--On Thursday, March 03, 2011 7:34 PM +0000 dhawes@vt.edu wrote:
...
Full_Name: David Hawes
Version: 2.4.24
OS: Ubuntu 10.04
URL:
Submission from: (NULL) (128.173.39.26)
When using slapadd or slapindex with the -q option, the message
"Closing DB..." is printed and then the application hangs
indefinitely. Removing the -q option allows the application to
complete without issue.
This occurs with Berkeley DB 4.7.25 (with patches) and 5.1.25.
I would ask you provide a full backtrace of the slapadd process after it
has hung. Otherwise, this report isn't of much use.
Also, if you are using the Ubuntu patches for OpenLDAP with your
OpenLDAP build, you are including a known database-corrupting patch.
Since you don't say how you built OpenLDAP, it is impossible for us to
know if you did this or not.
Both OpenLDAP and Berkeley DB are compiled from source.  No Ubuntu
packages or code is used.
Backtraces (I may need to recompile without optimization):
(gdb) thread apply all bt
Thread 2 (Thread 0x7ffee9003700 (LWP 29225)):
# 0  0x00007ffff763c85c in pthread_cond_wait@@GLIBC_2.3.2 ()
     from /lib/libpthread.so.0
# 1  0x00000000004b4150 in bdb_tool_trickle_task (ctx=<value optimized
# out>,
      ptr=<value optimized out>) at tools.c:1253
# 2  0x00000000005066b0 in ldap_int_thread_pool_wrapper (
      xpool=<value optimized out>) at tpool.c:685
# 3  0x00007ffff76379ca in start_thread () from /lib/libpthread.so.0
# 4  0x00007ffff677970d in clone () from /lib/libc.so.6
# 5  0x0000000000000000 in ?? ()
This indicates that the trickle task is still waiting for a signal on its
condition variable. Which is a bit odd since bdb_tool_entry_close()
already  signals it before slap_tool_destroy() is called.
It might be illuminating to run slapadd under gdb with a breakpoint on
bdb_tool_entry_close(), and singlestep through the first few lines of
that  function where it issues the signal, and see if the trickle task
actually  reacts or not.
...
Thread 1 (Thread 0x7ffff7fd9700 (LWP 29220)):
# 0  0x00007ffff763c85c in pthread_cond_wait@@GLIBC_2.3.2 ()
     from /lib/libpthread.so.0
# 1  0x0000000000506223 in ldap_pvt_thread_pool_destroy
(tpool=0x7ffffffed658,
      run_pending=<value optimized out>) at tpool.c:582
# 2  0x0000000000506a0a in ldap_int_thread_pool_shutdown () at
# tpool.c:181 3  0x00000000005050a9 in ldap_pvt_thread_destroy () at
# threads.c:70 4  0x0000000000466059 in slap_destroy () at init.c:273
# 5  0x00000000004a5ade in slap_tool_destroy () at slapcommon.c:932
# 6  0x00000000004a46e7 in slapadd (argc=0, argv=<value optimized out>)
      at slapadd.c:606
# 7  0x000000000041edc0 in main (argc=4, argv=0x7fffffffe048) at
# main.c:407
We are seeing numerous reports of this occurring with Zimbra after using 
OpenLDAP 2.4.23 + the multi-core fix (ITS#6660)
--Quanah
--
Quanah Gibson-Mount
Sr. Member of Technical Staff
Zimbra, Inc
A Division of VMware, Inc.
--------------------
Zimbra ::  the leader in open source messaging and collaboration