christianclinton@gmail.com wrote:
Full_Name: Christian Clinton Version: 2.4.36 OS: CentOS 6.4 URL: ftp://ftp.openldap.org/incoming/christian-clinton-140925.zip Submission from: (NULL) (216.148.0.72)
Description: Slapd hangs during LDAP modification operation. Subsequent read or write operations hang from all clients until slapd restart.
2.4.36 is over a year old. Bugs reported against old releases will not be investigated.
slapd logs show: slapd[11213]: => bdb_idl_insert_key: c_put id failed: BDB0068 DB_LOCK_DEADLOCK: Locker killed to resolve a deadlock (-30993) slapd[11213]: conn=12787 op=25: attribute "entryCSN" index add failure slapd[11213]: => bdb_idl_insert_key: c_put id failed: BDB0068 DB_LOCK_DEADLOCK: Locker killed to resolve a deadlock (-30993) slapd[11213]: conn=13382 op=3: attribute "entryCSN" index add failure
This is a normal occurrence in BerkeleyDB and does not indicate an actual error. The bdb/hdb backend code automatically retries when a BDB deadlock is reported.
This issue has occurred sporadically for several months across different OpenLDAP versions. I managed to capture of a backtrace for two occurrences of the crash (excerpt below, full backtrace loloaded to FTP). In both instances, a couple of threads were acquiring locks which is where I suspect the issue lies.
BT: ftp://ftp.openldap.org/incoming/christian-clinton-140925.zip
When talking about BDB deadlock issues you should always obtain the output from db_stat -CA on the db directory, in addition to the full stack trace.
You should consider testing with different BDB versions as well, as issues are known to slip into various updates.
Environment: OS: CentOS 6.4 OpenLDAP 2.4.36 1 Master w/ many syncrepl slaves
Thread 129 (Thread 0x7f10ddffb700 (LWP 16586)): #0 0x00000037e560b43c in pthread_cond_wait@@GLIBC_2.3.2 () from /lib64/libpthread.so.0 #1 0x00007f11a076e623 in __db_hybrid_mutex_suspend () from /usr/local/openldap/lib/libdb-5.3.so #2 0x00007f11a076d6eb in __db_tas_mutex_lock () from /usr/local/openldap/lib/libdb-5.3.so #3 0x00007f11a08205e6 in __lock_get_internal () from /usr/local/openldap/lib/libdb-5.3.so #4 0x00007f11a08208ab in __lock_get () from /usr/local/openldap/lib/libdb-5.3.so #5 0x00007f11a0848468 in __db_lget () from /usr/local/openldap/lib/libdb-5.3.so #6 0x00007f11a078dff6 in __bam_search () from /usr/local/openldap/lib/libdb-5.3.so #7 0x00007f11a0778ed6 in __bamc_search () from /usr/local/openldap/lib/libdb-5.3.so #8 0x00007f11a077c637 in __bamc_get () from /usr/local/openldap/lib/libdb-5.3.so #9 0x00007f11a083507d in __dbc_iget () from /usr/local/openldap/lib/libdb-5.3.so #10 0x00007f11a0845d00 in __dbc_get_pp () from /usr/local/openldap/lib/libdb-5.3.so #11 0x00000000005489ed in hdb_idl_delete_key (be=0x7f10ddff9480, db=0x7f111a3f9610, tid=0x7f10603015b0, key=0x7f10ddff8ec0, id=250418) at idl.c:947 #12 0x000000000054b455 in hdb_key_change (be=0x7f10ddff9480, db=0x7f111a3f9610, txn=0x7f10603015b0, k=0x7f1060001ab0, id=250418, op=2) at key.c:97 #13 0x000000000054a4ee in indexer (op=0x7f10603c9f90, txn=0x7f10603015b0, ad=0x1b6d5e0, atname=0x1b6d4b8, vals=0x7f1060001a88, id=250418, opid=2, mask=4) at index.c:214 #14 0x000000000054a978 in index_at_values (op=0x7f10603c9f90, txn=0x7f10603015b0, ad=0x1b6d5e0, type=0x1b6d450, tags=0x1b6d600, vals=0x7f1060001a88, id=250418, opid=2) at index.c:337 #15 0x000000000054aaf6 in hdb_index_values (op=0x7f10603c9f90, txn=0x7f10603015b0, desc=0x1b6d5e0, vals=0x7f1060001a88, id=250418, opid=2) at index.c:386 #16 0x00000000004dbb44 in hdb_modify_internal (op=0x7f10603c9f90, tid=0x7f10603015b0, modlist=0x7f10603f9930, e=0x7f10ddff91d0, text=0x7f10ddffa990, textbuf=0x7f10ddff9260 "\300\032\070\220\020\177", textlen=256) at modify.c:358 #17 0x00000000004dcc09 in hdb_modify (op=0x7f10603c9f90, rs=0x7f10ddffa970) at modify.c:642 #18 0x00000000004bbc11 in overlay_op_walk (op=0x7f10603c9f90, rs=0x7f10ddffa970, which=op_modify, oi=0x1c9a500, on=0x0) at backover.c:671 #19 0x00000000004bbe28 in over_op_func (op=0x7f10603c9f90, rs=0x7f10ddffa970, which=op_modify) at backover.c:723 #20 0x00000000004bbf64 in over_op_modify (op=0x7f10603c9f90, rs=0x7f10ddffa970) at backover.c:762 #21 0x000000000044d5c6 in fe_op_modify (op=0x7f10603c9f90, rs=0x7f10ddffa970) at modify.c:303 #22 0x000000000044ce98 in do_modify (op=0x7f10603c9f90, rs=0x7f10ddffa970) at modify.c:177 #23 0x000000000042d068 in connection_operation (ctx=0x7f10ddffaaa0, arg_v=0x7f10603c9f90) at connection.c:1155 #24 0x000000000042d609 in connection_read_thread (ctx=0x7f10ddffaaa0, argv=0xb7) at connection.c:1291 #25 0x0000000000596261 in ldap_int_thread_pool_wrapper (xpool=0x1b77350) at tpool.c:688 #26 0x00000037e5607851 in start_thread () from /lib64/libpthread.so.0 #27 0x00000037e52e890d in clone () from /lib64/libc.so.6
Thread 71 (Thread 0x7f0fedffb700 (LWP 16648)): #0 0x00000037e560b43c in pthread_cond_wait@@GLIBC_2.3.2 () from /lib64/libpthread.so.0 #1 0x00007f11a076e623 in __db_hybrid_mutex_suspend () from /usr/local/openldap/lib/libdb-5.3.so #2 0x00007f11a076d6eb in __db_tas_mutex_lock () from /usr/local/openldap/lib/libdb-5.3.so #3 0x00007f11a08205e6 in __lock_get_internal () from /usr/local/openldap/lib/libdb-5.3.so #4 0x00007f11a08208ab in __lock_get () from /usr/local/openldap/lib/libdb-5.3.so #5 0x00007f11a0848468 in __db_lget () from /usr/local/openldap/lib/libdb-5.3.so #6 0x00007f11a078dff6 in __bam_search () from /usr/local/openldap/lib/libdb-5.3.so #7 0x00007f11a0778ed6 in __bamc_search () from /usr/local/openldap/lib/libdb-5.3.so #8 0x00007f11a077cb1f in __bamc_get () from /usr/local/openldap/lib/libdb-5.3.so #9 0x00007f11a083507d in __dbc_iget () from /usr/local/openldap/lib/libdb-5.3.so #10 0x00007f11a0845d00 in __dbc_get_pp () from /usr/local/openldap/lib/libdb-5.3.so #11 0x000000000053f414 in hdb_dn2id_delete (op=0x7f10883ba740, txn=0x7f10a82e7430, eip=0x7f10505a82b0, e=0x7f117b58b968) at dn2id.c:646 #12 0x000000000053d8f0 in hdb_delete (op=0x7f10883ba740, rs=0x7f0fedffa970) at dele.c.c:406 #13 0x00000000004bbc11 in overlay_op_walk (op=0x7f10883ba740, rs=0x7f0fedffa970, which=op_delete, oi=0x1c9a500, on=0x0) at backover.c:671 #14 0x00000000004bbe28 in over_op_func (op=0x7f10883ba740, rs=0x7f0fedffa970, which=op_delete) at backover.c:723 #15 0x00000000004bbfe2 in over_op_delete (op=0x7f10883ba740, rs=0x7f0fedffa970) at backover.c:780 #16 0x000000000044ff7c in fe_op_delete (op=0x7f10883ba740, rs=0x7f0fedffa970) at delete.c:174 #17 0x000000000044fbef in do_delete (op=0x7f10883ba740, rs=0x7f0fedffa970) at delete.c:95 #18 0x000000000042d068 in connection_operation (ctx=0x7f0fedffaaa0, arg_v=0x7f10883ba740) at connection.c:1155 #19 0x000000000042d609 in connection_read_thread (ctx=0x7f0fedffaaa0, argv=0x3f) at connection.cA1A1291 #20 0x0000000000596261 in ldap_int_thread_pool_wrapper (xpool=0x1b77350) at tpool.c:688 #21 0x00000037e5607851 in start_thread () from /lib64/libpthread.so.0 #22 0x00000037e52e890d in clone () from /lib64/libc.so.6