On Tue, 18 Aug 2009, masarati(a)aero.polimi.it wrote:
>> Sorry about the stripped trace. I forgot that the install procedure
>> always strips the binaries...
>>
>> Okay, with our stress profile it takes ~36 hours to fail. I always
>> start with a clean db rebuild before each run. Each failure produces
>> the same traceback:
>>
>> (gdb) where
>> #0 0x00b97410 in __kernel_vsyscall ()
>> #1 0x00471d80 in raise () from /lib/libc.so.6
>> #2 0x00473691 in abort () from /lib/libc.so.6
>> #3 0x0046b1fb in __assert_fail () from /lib/libc.so.6
>> #4 0x0808d532 in ch_malloc (size=4436335) at ch_malloc.c:57
>
> ^^^ this really looks like memory exhaustion while trying to malloc a
> large chunk (>4MB). Can you tell, by printing e->e_name, whether it's
> correct that the server was modifying a large entry?
int entry_encode(Entry *e, struct berval *bv)
{
ber_len_t len, dnlen, ndnlen, i;
int nattrs, nvals;
Attribute *a;
unsigned char *ptr;
Debug( LDAP_DEBUG_TRACE, "=> entry_encode(0x%08lx): %s\n",
(long) e->e_id, e->e_dn, 0 );
dnlen = e->e_name.bv_len;
ndnlen = e->e_nname.bv_len;
entry_partsize( e, &len, &nattrs, &nvals, 1 );
bv->bv_len = len;
bv->bv_val = ch_malloc(len);
(gdb) p *e
$8 = {e_id = 343637, e_name = {bv_len = 0, bv_val = 0x822169c ""}, e_nname
= {bv_len = 0, bv_val = 0x822169c ""},
e_attrs = 0x208d270c, e_ocflags = 256, e_bv = {bv_len = 6598587, bv_val
= 0x1ddb5008 "\021"}, e_private = 0x9ad5ce0}
(gdb) p *bv
$2 = {bv_len = 4436293, bv_val = 0x1 <Address 0x1 out of bounds>}
(gdb) p nattrs
$6 = 16
(gdb) p nvals
$7 = 270294
Well, according to e->e_name, e_dn (e_name.bv_val) is empty, so this is
likely bogus. We do have some large member entries (>100K) in our test
profile, however none add up to anywhere near >4M.
Tracy
>
> p.
>
>> #5 0x08079ad2 in entry_encode (e=0x3a3dac0, bv=0x3a3d9b0) at entry.c:742
>> #6 0x0815240e in bdb_id2entry_put (be=0x3a3dca0, tid=0xbc6f7378,
>> e=0x3a3dac0, flag=0) at id2entry.c:54
>> #7 0x08152508 in hdb_id2entry_update (be=0x3a3dca0, tid=0xbc6f7378,
>> e=0x3a3dac0) at id2entry.c:90
>> #8 0x08106374 in hdb_modify (op=0xdabbc28, rs=0x3a3f0e4) at modify.c:611
>> #9 0x080ea38e in overlay_op_walk (op=0xdabbc28, rs=0x3a3f0e4,
>> which=op_modify, oi=0x8be2788, on=0x0) at backover.c:669
>> #10 0x080ea543 in over_op_func (op=0xdabbc28, rs=0x3a3f0e4,
>> which=op_modify) at backover.c:721
>> #11 0x080ea60b in over_op_modify (op=0xdabbc28, rs=0x3a3f0e4) at
>> backover.c:755
>> #12 0x08089151 in fe_op_modify (op=0xdabbc28, rs=0x3a3f0e4) at
>> modify.c:301
>> #13 0x08088b90 in do_modify (op=0xdabbc28, rs=0x3a3f0e4) at modify.c:175
>> #14 0x0806be8f in connection_operation (ctx=0x3a3f1d0, arg_v=0xdabbc28) at
>> connection.c:1115
>> #15 0x0806c3cf in connection_read_thread (ctx=0x3a3f1d0, argv=0x1a) at
>> connection.c:1251
>> #16 0x081d8fa9 in ldap_int_thread_pool_wrapper (xpool=0x8b941b0) at
>> tpool.c:685
>> #17 0x0043749b in start_thread () from /lib/libpthread.so.0
>> #18 0x0051a42e in clone () from /lib/libc.so.6
>
>
>