Full_Name: Aaron Richton
Version: 2.3.40
OS: Solaris 9
URL:
ftp://ftp.openldap.org/incoming/
Submission from: (NULL) (128.6.31.135)
One hdb backend on one slave died ~21:58 yesterday...
current thread: t@5
[1] _libc_poll(0xffffffff4f3ff430, 0x0, 0x3e8, 0x0, 0x0, 0x0), at
0xffffffff7f0a741c
[2] _select(0x3e8, 0xffffffff7f1bc728, 0xffffffff7f1bc728, 0x0,
0xffffffff7f1bc728, 0x0), at 0xffffffff7f05a74c
[3] select(0x0, 0x0, 0x0, 0x0, 0xffffffff4f3ff5b0, 0x0), at
0xffffffff7e0108e8
=>[4] __os_sleep(dbenv = 0x1005b2610, secs = 1U, usecs = 0), line 84 in
"os_sleep.c"
[5] __memp_sync_int(dbenv = 0x1005b2610, dbmfp = (nil), trickle_max = 0, op =
DB_SYNC_CACHE, wrotep = (nil)), line 362 in "mp_sync.c"
[6] __memp_sync(dbenv = 0x1005b2610, lsnp = (nil)), line 99 in "mp_sync.c"
[7] __txn_checkpoint(dbenv = 0x1005b2610, kbytes = 100000U, minutes = 10U,
flags = 0), line 1389 in "txn.c"
[8] __txn_checkpoint_pp(dbenv = 0x1005b2610, kbytes = 100000U, minutes = 10U,
flags = 0), line 1288 in "txn.c"
[9] hdb_checkpoint(ctx = 0xffffffff4f3ffc30, arg = 0x1004b4c60), line 165 in
"config.c"
[10] ldap_int_thread_pool_wrapper(xpool = 0x10041e500), line 478 in
"tpool.c"
(dbx) where
current thread: t@16
[1] _libc_poll(0xffffffff46ffe3e0, 0x0, 0x3e8, 0x0, 0x0, 0x0), at
0xffffffff7f0a741c
[2] _select(0x3e8, 0xffffffff7f1bc728, 0xffffffff7f1bc728, 0x0,
0xffffffff7f1bc728, 0x0), at 0xffffffff7f05a74c
[3] select(0x0, 0x0, 0x0, 0x0, 0xffffffff46ffe560, 0x0), at
0xffffffff7e0108e8
=>[4] __os_sleep(dbenv = 0x1005b2610, secs = 1U, usecs = 0), line 84 in
"os_sleep.c"
[5] __memp_sync_int(dbenv = 0x1005b2610, dbmfp = (nil), trickle_max = 0, op =
DB_SYNC_CACHE, wrotep = (nil)), line 439 in "mp_sync.c"
[6] __memp_sync(dbenv = 0x1005b2610, lsnp = (nil)), line 99 in "mp_sync.c"
[7] __txn_checkpoint(dbenv = 0x1005b2610, kbytes = 100000U, minutes = 10U,
flags = 0), line 1389 in "txn.c"
[8] __txn_checkpoint_pp(dbenv = 0x1005b2610, kbytes = 100000U, minutes = 10U,
flags = 0), line 1288 in "txn.c"
[9] hdb_delete(op = 0xffffffff46fff618, rs = 0xffffffff46fff088), line 537 in
"delete.c"
[10] syncrepl_entry(si = 0x1004b4e50, op = 0xffffffff46fff618, entry = (nil),
modlist = 0xffffffff46fff320, syncstate = 3, syncUUID = 0xffffffff46fff3c0,
syncCookie_req = 0xffffffff46fff360, syncCSN =
0xffffffff46fff390), line 2006 in "syncrepl.c"
[11] do_syncrep2(op = 0xffffffff46fff618, si = 0x1004b4e50), line 731 in
"syncrepl.c"
[12] do_syncrepl(ctx = 0xffffffff46fffc30, arg = 0x1004b5030), line 1095 in
"syncrepl.c"
[13] ldap_int_thread_pool_wrapper(xpool = 0x10041e500), line 478 in
"tpool.c"
I can't get db_stat to join the environment. If there's anything else that can
be gleaned from slapd itself, I'd be glad to poke around the core; otherwise,
I'm off to rm/slapadd...
"This makes sense and shouldn't happen in 2.3.41" would be fine too, but
none of
the changes (to my eye) looked locking related.
Unfortunately no, nothing familiar here. There's nothing in the BDB
documentation that says two threads are not allowed to call txn_checkpoint
concurrently, but I suppose it may be excessive to make multiple calls in
rapid succession.
One thing that I've started doing recently in my configs is to skip the #bytes
option (leave it zero), so that only time-based checkpoints occur. Since
they're done in a dedicated task, only one thread at a time can trigger a
checkpoint.
--
-- Howard Chu
Chief Architect, Symas Corp.