Full_Name: eljko Nejamić
Version: latest git pull of OPENLDAP_REL_ENG_2_4
OS: RedHat 6.3
URL:
Submission from: (NULL) (213.147.123.33)
Using ldclt tool to stress test our OpenLDAP mirror sync setup I encountered a
SIGBUS. Do note that the same issue occurs on only one node too, without sync.
I've tested using the aforementioned tool and the same arguments on both Red Hat
6.3 (2.6.32-279.el6.x86_64) and Ubuntu Server 12.04 (Linux 3.2.0-54-generic
x86_64) with the exact outcome.
In both cases the OpenLDAP was compiled from sources
(origin/OPENLDAP_REL_ENG_2_4), configured with --disable-{hdb,bdb},
--prefix=/opt/openldap, --enable-local=yes and using mdb as a backend, tweaked
additionally with:
* nometasync
* writemap
Without the writemap tweak, SIGBUS isn't happening.
The command used was:
ldclt -h 172.17.101.150 -p 389 -D "cn=xxx,dc=xxx" -w "xxx" -b
"ds=USERS,o=STANDARD,dc=xxx" \
-e object=xxx.txt,rdn='uid:[A=INCRNNOLOOP(200000;999999;6)]' -e
add,commoncounter -I 68
...where the xxx.txt has the following content:
objectclass: xxxUser
The ldclt command uses 10 threads to do the add operation with the incrementing
uid parameter on the base dn: ds=USERS,o=STANDARD,dc=xxx.
Ulimits are:
ulimit -a
core file size (blocks, -c) unlimited
data seg size (kbytes, -d) unlimited
scheduling priority (-e) 0
file size (blocks, -f) unlimited
pending signals (-i) 2066206
max locked memory (kbytes, -l) unlimited
max memory size (kbytes, -m) unlimited
open files (-n) 4096
pipe size (512 bytes, -p) 8
POSIX message queues (bytes, -q) unlimited
real-time priority (-r) 0
stack size (kbytes, -s) unlimited
cpu time (seconds, -t) unlimited
max user processes (-u) 1024
virtual memory (kbytes, -v) unlimited
file locks (-x) unlimited
At first sight, gdb seems to point to mdb_page_alloc:
Starting program: /opt/openldap/libexec/slapd -h ldap:///\ ldapi:/// -F
/opt/openldap/etc/openldap/slapd.d -g openldap -u openldap -d 0
[Thread debugging using libthread_db enabled]
[New Thread 0x2aaaac764700 (LWP 20415)]
[Thread 0x2aaaac764700 (LWP 20415) exited]
[New Thread 0x2aaaac764700 (LWP 20416)]
[New Thread 0x2ab3ad168700 (LWP 20417)]
[New Thread 0x2ab3ad969700 (LWP 20418)]
[New Thread 0x2ab3ae16a700 (LWP 20419)]
[New Thread 0x2ab3ae96b700 (LWP 20420)]
[New Thread 0x2ab3b8800700 (LWP 20421)]
[New Thread 0x2ab3d4800700 (LWP 20422)]
Program received signal SIGBUS, Bus error.
[Switching to Thread 0x2ab3b8800700 (LWP 20421)]
mdb_page_alloc (mc=<value optimized out>, num=1, mp=0x2ab3b87fd8b8)
at ./../../../libraries/liblmdb/mdb.c:1759
warning: Source file is more recent than executable.
1759 np->mp_pgno = pgno;
And the backtrace is:
#0 mdb_page_alloc (mc=<value optimized out>, num=1, mp=0x2ab3b87fd8b8)
at ./../../../libraries/liblmdb/mdb.c:1759
#1 0x00000000004afb19 in mdb_page_touch (mc=0x2ab3bc1103f0)
at ./../../../libraries/liblmdb/mdb.c:1889
#2 0x00000000004b1c8c in mdb_cursor_touch (mc=0x2ab3bc1103f0)
at ./../../../libraries/liblmdb/mdb.c:5597
#3 0x00000000004b3a85 in mdb_cursor_put (mc=0x2ab3bc1103f0,
key=0x2ab3b87ff000,
data=0x2ab3b87feff0, flags=32) at
./../../../libraries/liblmdb/mdb.c:5727
#4 0x00000000004f8586 in mdb_idl_insert_keys (be=<value optimized out>,
cursor=0x2ab3bc1103f0,
keys=<value optimized out>, id=13) at idl.c:534
#5 0x00000000004f9116 in indexer (op=0x2ab3bc10dbd0, txn=<value optimized
out>,
ai=<value optimized out>, ad=0x88e0e0, atname=0x88dfb8,
vals=0x2ab3bc110120, id=13, opid=1,
mask=4) at index.c:219
#6 0x00000000004f95d1 in index_at_values (op=0x2ab3bc10dbd0,
txn=0x2ab3bc10e2f0,
ad=<value optimized out>, type=0x88df50, tags=0x88e100,
vals=0x2ab3bc110120, id=13, opid=1)
at index.c:337
#7 0x00000000004f9627 in mdb_index_values (op=<value optimized out>,
txn=<value optimized out>,
desc=<value optimized out>, vals=<value optimized out>, id=<value
optimized out>,
opid=<value optimized out>) at index.c:386
#8 0x00000000004f96f9 in mdb_index_entry (op=0x2ab3bc10dbd0,
txn=0x2ab3bc10e2f0, opid=1,
e=0x8c18d8) at index.c:558
#9 0x00000000004ed77e in mdb_add (op=0x2ab3bc10dbd0, rs=0x2ab3b87ff950) at
add.c:359
#10 0x0000000000487ac7 in overlay_op_walk (op=0x2ab3bc10dbd0,
rs=0x2ab3b87ff950, which=op_add,
oi=0x932280, on=0x0) at backover.c:671
#11 0x00000000004884a7 in over_op_func (op=0x2ab3bc10dbd0, rs=<value
optimized out>,
which=<value optimized out>) at backover.c:723
#12 0x00000000004281c0 in fe_op_add (op=0x2ab3bc10dbd0, rs=0x2ab3b87ff950)
at add.c:334
#13 0x0000000000428a16 in do_add (op=0x2ab3bc10dbd0, rs=0x2ab3b87ff950) at
add.c:194
#14 0x0000000000421259 in connection_operation (ctx=0x2ab3b87ffab0,
arg_v=0x2ab3bc10dbd0)
at connection.c:1155
#15 0x0000000000421a35 in connection_read_thread (ctx=0x2ab3b87ffab0,
argv=<value optimized out>)
at connection.c:1291
#16 0x0000000000516380 in ldap_int_thread_pool_wrapper (xpool=0x898160) at
tpool.c:688
#17 0x000000384cc07851 in start_thread () from /lib64/libpthread.so.0
#18 0x000000384c8e767d in clone () from /lib64/libc.so.6
For context, the assembly land around the offending pointer dereferencing looks
like:
0x4af949 <mdb_page_alloc+665> movslq %r13d,%rax
0x4af94c <mdb_page_alloc+668> lea (%rcx,%rax,1),%rax
0x4af950 <mdb_page_alloc+672> mov %rax,0x10(%r14)
0x4af954 <mdb_page_alloc+676> mov %rcx,0x0(%rbp)
Hardware underneath all of that is:
1) HP ProLiant BL460c Gen8, dual Xeon E5-2658 with attached storage blade
D2200sb with SSD raid, 256GB RAM -- RedHat 6.3 tests
2) Intel server blade S2400BB, dual Xeon E5-2403, 48GB RAM -- Ubuntu 12.04
tests
If anything more is required to assist you in troubleshooting, please let me
know.
Zeljko