requate@univention.de wrote:
Full_Name: Arvid Requate Version: 2.4.23 OS: Debian Lenny URL: http://apt.univention.de/download/temp/openldap/trace_openldap_2.4.23_db_4.7... Submission from: (NULL) (82.198.197.8)
With OpenLDAP 2.4.23 and bdb 4.7.25 we seem to hit something like a race condition that can be triggered by concurrent ldapdelete and search_s operations. Though a bit simmilar, this condition does no quite match the details of ITS#5707. The URL provides a tar archive containing three gdb traces and corresponding slapd log output (loglevel: trace args stats) of three cases of lockup, where slapd hangs consuming 100% of CPU after a couple of modifications with the shell script contained in the tar archive and remains unresponsive until restartet.The number of successful operations varies between the test runs.
Berkeley DB 4.7.25 (May 15, 2008) was built with Oracle patches for Bugs #16415 and #16541 and configure options "--enable-posixmutexes --with-mutex=POSIX/pthreads".
The test machine is a single processor/single core 686 VM running Linux 2.6.32 686 bigmem. The concurrent searches are performed by a separate process that gets informed about ldap modifications (via file) by an slapd overlay module called 'translog'. To me the traces do not seem to indicate a problem in the overlay code (i.e. there is no reference to the on_response function "translog_response" in the traces).
Maybe there is some obvious point here we are missing? More debug details can be provided if necessary.
Try again using 2.4.24. There was a bug with back-bdb delete fixed recently (ITS#6577) so the relevant code has changed since .23.
Also try a newer BerkeleyDB. We've had other deadlocks with 4.7 that no longer occur in 4.8.