Full_Name: eljko Nejamić Version: latest git pull of OPENLDAP_REL_ENG_2_4 OS: RedHat 6.3 URL: Submission from: (NULL) (213.147.123.33)
Using ldclt tool to stress test our OpenLDAP mirror sync setup I encountered a SIGBUS. Do note that the same issue occurs on only one node too, without sync. I've tested using the aforementioned tool and the same arguments on both Red Hat 6.3 (2.6.32-279.el6.x86_64) and Ubuntu Server 12.04 (Linux 3.2.0-54-generic x86_64) with the exact outcome. In both cases the OpenLDAP was compiled from sources (origin/OPENLDAP_REL_ENG_2_4), configured with --disable-{hdb,bdb}, --prefix=/opt/openldap, --enable-local=yes and using mdb as a backend, tweaked additionally with: * nometasync * writemap
Without the writemap tweak, SIGBUS isn't happening.
The command used was: ldclt -h 172.17.101.150 -p 389 -D "cn=xxx,dc=xxx" -w "xxx" -b "ds=USERS,o=STANDARD,dc=xxx" \ -e object=xxx.txt,rdn='uid:[A=INCRNNOLOOP(200000;999999;6)]' -e add,commoncounter -I 68
...where the xxx.txt has the following content: objectclass: xxxUser
The ldclt command uses 10 threads to do the add operation with the incrementing uid parameter on the base dn: ds=USERS,o=STANDARD,dc=xxx.
Ulimits are: ulimit -a core file size (blocks, -c) unlimited data seg size (kbytes, -d) unlimited scheduling priority (-e) 0 file size (blocks, -f) unlimited pending signals (-i) 2066206 max locked memory (kbytes, -l) unlimited max memory size (kbytes, -m) unlimited open files (-n) 4096 pipe size (512 bytes, -p) 8 POSIX message queues (bytes, -q) unlimited real-time priority (-r) 0 stack size (kbytes, -s) unlimited cpu time (seconds, -t) unlimited max user processes (-u) 1024 virtual memory (kbytes, -v) unlimited file locks (-x) unlimited
At first sight, gdb seems to point to mdb_page_alloc: Starting program: /opt/openldap/libexec/slapd -h ldap:///\ ldapi:/// -F /opt/openldap/etc/openldap/slapd.d -g openldap -u openldap -d 0 [Thread debugging using libthread_db enabled] [New Thread 0x2aaaac764700 (LWP 20415)] [Thread 0x2aaaac764700 (LWP 20415) exited] [New Thread 0x2aaaac764700 (LWP 20416)] [New Thread 0x2ab3ad168700 (LWP 20417)] [New Thread 0x2ab3ad969700 (LWP 20418)] [New Thread 0x2ab3ae16a700 (LWP 20419)] [New Thread 0x2ab3ae96b700 (LWP 20420)] [New Thread 0x2ab3b8800700 (LWP 20421)] [New Thread 0x2ab3d4800700 (LWP 20422)]
Program received signal SIGBUS, Bus error. [Switching to Thread 0x2ab3b8800700 (LWP 20421)] mdb_page_alloc (mc=<value optimized out>, num=1, mp=0x2ab3b87fd8b8) at ./../../../libraries/liblmdb/mdb.c:1759 warning: Source file is more recent than executable. 1759 np->mp_pgno = pgno;
And the backtrace is: #0 mdb_page_alloc (mc=<value optimized out>, num=1, mp=0x2ab3b87fd8b8) at ./../../../libraries/liblmdb/mdb.c:1759 #1 0x00000000004afb19 in mdb_page_touch (mc=0x2ab3bc1103f0) at ./../../../libraries/liblmdb/mdb.c:1889 #2 0x00000000004b1c8c in mdb_cursor_touch (mc=0x2ab3bc1103f0) at ./../../../libraries/liblmdb/mdb.c:5597 #3 0x00000000004b3a85 in mdb_cursor_put (mc=0x2ab3bc1103f0, key=0x2ab3b87ff000, data=0x2ab3b87feff0, flags=32) at ./../../../libraries/liblmdb/mdb.c:5727 #4 0x00000000004f8586 in mdb_idl_insert_keys (be=<value optimized out>, cursor=0x2ab3bc1103f0, keys=<value optimized out>, id=13) at idl.c:534 #5 0x00000000004f9116 in indexer (op=0x2ab3bc10dbd0, txn=<value optimized out>, ai=<value optimized out>, ad=0x88e0e0, atname=0x88dfb8, vals=0x2ab3bc110120, id=13, opid=1, mask=4) at index.c:219 #6 0x00000000004f95d1 in index_at_values (op=0x2ab3bc10dbd0, txn=0x2ab3bc10e2f0, ad=<value optimized out>, type=0x88df50, tags=0x88e100, vals=0x2ab3bc110120, id=13, opid=1) at index.c:337 #7 0x00000000004f9627 in mdb_index_values (op=<value optimized out>, txn=<value optimized out>, desc=<value optimized out>, vals=<value optimized out>, id=<value optimized out>, opid=<value optimized out>) at index.c:386 #8 0x00000000004f96f9 in mdb_index_entry (op=0x2ab3bc10dbd0, txn=0x2ab3bc10e2f0, opid=1, e=0x8c18d8) at index.c:558 #9 0x00000000004ed77e in mdb_add (op=0x2ab3bc10dbd0, rs=0x2ab3b87ff950) at add.c:359 #10 0x0000000000487ac7 in overlay_op_walk (op=0x2ab3bc10dbd0, rs=0x2ab3b87ff950, which=op_add, oi=0x932280, on=0x0) at backover.c:671 #11 0x00000000004884a7 in over_op_func (op=0x2ab3bc10dbd0, rs=<value optimized out>, which=<value optimized out>) at backover.c:723 #12 0x00000000004281c0 in fe_op_add (op=0x2ab3bc10dbd0, rs=0x2ab3b87ff950) at add.c:334 #13 0x0000000000428a16 in do_add (op=0x2ab3bc10dbd0, rs=0x2ab3b87ff950) at add.c:194 #14 0x0000000000421259 in connection_operation (ctx=0x2ab3b87ffab0, arg_v=0x2ab3bc10dbd0) at connection.c:1155 #15 0x0000000000421a35 in connection_read_thread (ctx=0x2ab3b87ffab0, argv=<value optimized out>) at connection.c:1291 #16 0x0000000000516380 in ldap_int_thread_pool_wrapper (xpool=0x898160) at tpool.c:688 #17 0x000000384cc07851 in start_thread () from /lib64/libpthread.so.0 #18 0x000000384c8e767d in clone () from /lib64/libc.so.6
For context, the assembly land around the offending pointer dereferencing looks like: 0x4af949 <mdb_page_alloc+665> movslq %r13d,%rax 0x4af94c <mdb_page_alloc+668> lea (%rcx,%rax,1),%rax 0x4af950 <mdb_page_alloc+672> mov %rax,0x10(%r14) 0x4af954 <mdb_page_alloc+676> mov %rcx,0x0(%rbp)
Hardware underneath all of that is: 1) HP ProLiant BL460c Gen8, dual Xeon E5-2658 with attached storage blade D2200sb with SSD raid, 256GB RAM -- RedHat 6.3 tests 2) Intel server blade S2400BB, dual Xeon E5-2403, 48GB RAM -- Ubuntu 12.04 tests
If anything more is required to assist you in troubleshooting, please let me know.
Zeljko