toby@inf.ed.ac.uk wrote:
Hi there, I've been collecting a bit more information on these crashes...
Since we started running 2.4.15:
26 core files, of these, 15 have crashed in memory.c:152
It looks like the majority of these were removing stale entries from the cache when slapd crashed, which wasn't the case with the original ticket I submitted. Also, the last thing logged by slapd in quite a few of these cases was along this lines of ...
Mar 30 20:50:16 albany slapd[14239]: DELETING ENTRY TEMPLATE=db78c810-0c2a-49de-96af-f5796bbe8ca3 Mar 30 20:50:17 albany last message repeated 14 times
... I don't know whether this is useful?
Here's the details of the latest crash, slapd log and backtrace:
Program terminated with signal 6, Aborted. #0 0x00185402 in __kernel_vsyscall () (gdb) bt
When a program aborts it will print an error message on stderr. It would be useful to have that message.
Can you please update to RE24, the 2.4.16 candidate, and see if the behavior has changed? Also, can you please be sure to compile without optimization (or with frame pointers intact; gcc -fno-omit-frame-pointer); there seem to be a few functions missing in these stack traces.
Also grab the latest overlays/pcache.c (1.168). It probably won't fix things, but it should make the crashes a little more obvious.
#0 0x00185402 in __kernel_vsyscall () #1 0x00476d20 in raise () from /lib/libc.so.6 #2 0x00478631 in abort () from /lib/libc.so.6 #3 0x004aee6b in __libc_message () from /lib/libc.so.6 #4 0x004b6b16 in _int_free () from /lib/libc.so.6 #5 0x004ba070 in free () from /lib/libc.so.6 #6 0x081d5746 in ber_memfree_x (p=0x841e688, ctx=0x0) at memory.c:152 #7 0x081d630c in ber_bvarray_free_x (a=0x843af58, ctx=0x0) at memory.c:731 #8 0x081d6343 in ber_bvarray_free (a=0x843af58) at memory.c:741 #9 0x0807979c in attr_clean (a=0xb62934ec) at attr.c:146 #10 0x0807983b in attrs_free (a=0xb62934ec) at attr.c:196 #11 0x0807c059 in entry_clean (e=0xb6727584) at entry.c:504 #12 0x0807c080 in entry_free (e=0xb6727584) at entry.c:514 #13 0x08140030 in bdb_entry_return (e=0xb6727584) at id2entry.c:229 #14 0x08134c0c in bdb_cache_delete_cleanup (cache=0x838956c, ei=0xb332a060) at cache.c:1316 #15 0x0813a824 in bdb_delete (op=0xb32fed5c, rs=0xb32febec) at delete.c:575 #16 0x08177be1 in remove_query_data (op=0xb32fed5c, rs=0xb32fecf8, query_uuid=0x83ff288) at pcache.c:1460 #17 0x0817af8f in consistency_check (ctx=0xb32ff1d0, arg=0x83f57e8) at pcache.c:2611 #18 0x0819ecad in ldap_int_thread_pool_wrapper (xpool=0x8366f90) at tpool.c:663 #19 0x005c746b in start_thread () from /lib/libpthread.so.0 #20 0x0051edbe in clone () from /lib/libc.so.6 (gdb)
Also, I'll include details of another type of crash that I've seen 3 times since running 2.4.15 in the lab, at the risk of overloading information into this ticket. I'm including it here as the slapd behaviour at time of crash was similar to that noted above, i.e. removing stale entries from the cache:
Yeah, based on the debug output it's probably related.