I've seen this now with two different masters, using BDB 4.2.52 + patches, which we long considered stable:
Thread 3 (Thread 1115703648 (LWP 31507)): #0 0x0000003fca90bd9c in __pread_nocancel () from /lib64/tls/libpthread.so.0 #1 0x0000002a97348edc in __os_io (dbenv=0xdc3100, op=1, fhp=0xf7d2d0, pgno=71531, pagesize=4096, buf=0x2cb08f8cf0 "_3", niop=0x42681b70) at ../dist/../os/os_rw.c:55 #2 0x0000002a973405ea in __memp_pgread (dbmfp=0xeb5780, mutexp=0x2cbba53150, bhp=0x2cb08f8c58, can_create=0) at ../dist/../mp/mp_bh.c:219 #3 0x0000002a973413f7 in __memp_fget (dbmfp=0xeb5780, pgnoaddr=0x42681cc4, flags=0, addrp=0x42681cc8) at ../dist/../mp/mp_fget.c:580 #4 0x0000002a972d18b3 in __bam_search (dbc=0xdb1dc40, root_pgno=Variable "root_pgno" is not available. ) at ../dist/../btree/bt_search.c:307 #5 0x0000002a972c6e27 in __bam_c_search (dbc=0xdb1dc40, root_pgno=0, key=0x42681f40, flags=28, exactp=0x42681df4) at ../dist/../btree/bt_cursor.c:2547 #6 0x0000002a972c7b7e in __bam_c_get (dbc=0xdb1dc40, key=0x42681f40, data=0x42681f20, flags=28, pgnop=0x42681e64) at ../dist/../btree/bt_cursor.c:962 #7 0x0000002a9730f51b in __db_c_get (dbc_arg=0x28ea700, key=0x42681f40, data=0x42681f20, flags=28) at ../dist/../db/db_cam.c:643 #8 0x0000002a973177d7 in __db_c_get_pp (dbc=0x28ea700, key=0x42681f40, data=0x42681f20, flags=28) at ../dist/../db/db_iface.c:1836 #9 0x0000002a97183dc2 in bdb_dn2id (op=0x428025e0, dn=0x42682028, ei=0x42682010, locker=44, lock=0x42681fd0) at dn2id.c:315 #10 0x0000002a971896fd in bdb_cache_find_ndn (op=0x428025e0, locker=44, ndn=0x291c370, res=0x42682310) at cache.c:341 #11 0x0000002a97189e39 in bdb_cache_find_id (op=0x428025e0, tid=0x0, id=25145, eip=0x42682310, islocked=0, locker=44, lock=0x42682250) at cache.c:716 #12 0x0000002a9717b1c7 in bdb_search (op=0x428025e0, rs=0x42802570) at search.c:696 #13 0x0000002a975921eb in syncprov_findcsn (op=0x5ccd900, mode=FIND_PRESENT) at syncprov.c:681 #14 0x0000002a9759690a in syncprov_op_search (op=0x5ccd900, rs=0x42803d60) at syncprov.c:2074 #15 0x000000000049b935 in overlay_op_walk (op=0x5ccd900, rs=0x42803d60, which=op_search, oi=0xded8c0, on=0xded700) at backover.c:640 #16 0x000000000049bb91 in over_op_func (op=0x5ccd900, rs=0x42803d60, which=op_search) at backover.c:702 #17 0x000000000049bc27 in over_op_search (op=0x5ccd900, rs=0x42803d60) at backover.c:724 #18 0x000000000042f1d8 in fe_op_search (op=0x5ccd900, rs=0x42803d60) at search.c:355 #19 0x000000000042ecac in do_search (op=0x5ccd900, rs=0x42803d60) at search.c:217 #20 0x000000000042bd9c in connection_operation (ctx=0x42803e90, arg_v=0x5ccd900) at connection.c:1133 #21 0x000000000042c28c in connection_read_thread (ctx=0x42803e90, argv=0x1e) at connection.c:1261 ---Type <return> to continue, or q <return> to quit--- #22 0x0000002a956c7c77 in ldap_int_thread_pool_wrapper (xpool=0x8a1f00) at tpool.c:478 #23 0x0000003fca90610a in start_thread () from /lib64/tls/libpthread.so.0 #24 0x0000003fca0c68c3 in clone () from /lib64/tls/libc.so.6 #25 0x0000000000000000 in ?? ()
Has anyone else seen anything like this? It seems there's possibly a bug inside BDB 4.2.52 that gets hit occasionally.
--Quanah
--
Quanah Gibson-Mount Principal Software Engineer Zimbra, Inc -------------------- Zimbra :: the leader in open source messaging and collaboration