Please test RE24 as we prepare for 2.4.17. Thanks!
--Quanah
--
Quanah Gibson-Mount Principal Software Engineer Zimbra, Inc -------------------- Zimbra :: the leader in open source messaging and collaboration
-----BEGIN PGP SIGNED MESSAGE----- Hash: SHA1
Quanah Gibson-Mount wrote:
Please test RE24 as we prepare for 2.4.17. Thanks!
All regression tests gave 'completed OK' (test 058-syncrepl-asymmetric says No race errors found after 10 iterations but Found 2 errors) on FreeBSD/amd64.
Cheers, - -- Xin LI delphij@delphij.net http://www.delphij.net/ FreeBSD - The Power to Serve!
On Tue, 9 Jun 2009, Aaron Richton wrote:
Got at least one SEGV, in meta-concurrency. Give me some time to tar it up...
Misspoke...not a SEGV, but rather threw an assertion:
in bind.c: Current function is ldap_back_conn_delete 157 assert( !LDAP_BACK_CONN_TAINTED( lc ) );
testrun directory, plus full backtrace, in: https://www.nbcs.rutgers.edu/~richton/testfailed.2009061614.tgz
test050 assertion...
Current function is syncrepl_updateCookie syncrepl.c:3084 assert( !syn->ssyn_validate( syn, si->si_cookieState->cs_vals+i ));
backtrace and testrun directory: https://www.nbcs.rutgers.edu/~richton/testfailed.2009060519.tgz
Different assertion.
Current function is ldap_pvt_runqueue_resched current thread: t@7 [1] __lwp_kill(0x0, 0x6, 0x0, 0xff33c000, 0x0, 0x0), at 0xff31feb0 [2] raise(0x6, 0x0, 0xfbbff5d8, 0x0, 0x0, 0x0), at 0xff2d0b28 [3] abort(0x32, 0xfbbff668, 0x32, 0x7efefeff, 0x81010100, 0xff00), at 0xff2b6e70 [4] __assert(0xff1e7c00, 0xff1e7c0c, 0xa5, 0x2b2368, 0x4, 0x0), at 0xff2b7110 =>[5] ldap_pvt_runqueue_resched(rq = 0x39f298, entry = 0x501c80, defer = 1), line 165 in "rq.c" [6] do_syncrepl(ctx = 0xfbbffe0c, arg = 0x501c80), line 1424 in "syncrepl.c" [7] ldap_int_thread_pool_wrapper(xpool = 0x3eb0c0), line 663 in "tpool.c"
rq.c:165 assert ( e == entry );
(dbx) print e e = (nil) (dbx) print *entry *entry = { next_sched = { tv_sec = 0 tv_usec = 0 } interval = { tv_sec = 60 tv_usec = 0 } tnext = { stqe_next = 0x448c70 } rnext = { stqe_next = (nil) } routine = (nil) arg = 0x4483b8 tname = 0x501c78 "" tspec = 0x4483cc "rid=004" }
backtrace and testrun: https://www.nbcs.rutgers.edu/~richton/testfailed.200906101238.tgz
Aaron Richton wrote:
Different assertion.
Current function is ldap_pvt_runqueue_resched current thread: t@7 [1] __lwp_kill(0x0, 0x6, 0x0, 0xff33c000, 0x0, 0x0), at 0xff31feb0 [2] raise(0x6, 0x0, 0xfbbff5d8, 0x0, 0x0, 0x0), at 0xff2d0b28 [3] abort(0x32, 0xfbbff668, 0x32, 0x7efefeff, 0x81010100, 0xff00), at 0xff2b6e70 [4] __assert(0xff1e7c00, 0xff1e7c0c, 0xa5, 0x2b2368, 0x4, 0x0), at 0xff2b7110 =>[5] ldap_pvt_runqueue_resched(rq = 0x39f298, entry = 0x501c80, defer = 1), line 165 in "rq.c" [6] do_syncrepl(ctx = 0xfbbffe0c, arg = 0x501c80), line 1424 in "syncrepl.c" [7] ldap_int_thread_pool_wrapper(xpool = 0x3eb0c0), line 663 in "tpool.c"
rq.c:165 assert ( e == entry );
In frame 6, can you print *si...
(dbx) print e e = (nil) (dbx) print *entry *entry = { next_sched = { tv_sec = 0 tv_usec = 0 } interval = { tv_sec = 60 tv_usec = 0 } tnext = { stqe_next = 0x448c70 } rnext = { stqe_next = (nil) } routine = (nil) arg = 0x4483b8 tname = 0x501c78 "" tspec = 0x4483cc "rid=004" }
backtrace and testrun: https://www.nbcs.rutgers.edu/~richton/testfailed.200906101238.tgz
Frame 6...
*si = { si_next = (nil) si_be = 0x41ab28 si_wbe = 0x41ab28 si_re = 0x453ce0 si_rid = 4 si_ridtxt = "rid=004" si_bindconf = { sb_uri = { bv_len = 22U bv_val = 0x41d438 "ldap://localhost:9014/" } sb_version = 3 sb_tls = 0 sb_method = 128 sb_timeout_api = 3 sb_timeout_net = 0 sb_binddn = { bv_len = 9U bv_val = 0x4456d8 "cn=config" } sb_cred = { bv_len = 8U bv_val = 0x4456f0 "oIr.7AeJ" } sb_saslmech = { bv_len = 0 bv_val = (nil) } sb_secprops = (nil) sb_realm = { bv_len = 0 bv_val = (nil) } sb_authcId = { bv_len = 0 bv_val = (nil) } sb_authzId = { bv_len = 0 bv_val = (nil) } sb_tls_ctx = (nil) sb_tls_cert = (nil) sb_tls_key = (nil) sb_tls_cacert = (nil) sb_tls_cacertdir = (nil) sb_tls_reqcert = (nil) sb_tls_cipher_suite = (nil) sb_tls_protocol_min = (nil) sb_tls_crlcheck = (nil) sb_tls_do_init = 0 } si_base = { bv_len = 9U bv_val = 0x4456a8 "cn=config" } si_logbase = { bv_len = 0 bv_val = (nil) } si_filterstr = { bv_len = 15U bv_val = 0x44d338 "(objectclass=*)" } si_logfilterstr = { bv_len = 0 bv_val = (nil) } si_scope = 2 si_attrsonly = 0 si_anfile = (nil) si_anlist = 0x41d598 si_exanlist = 0x443938 si_attrs = 0x449d68 si_exattrs = (nil) si_allattrs = 0 si_allopattrs = 0 si_schemachecking = 0 si_type = 3 si_ctype = 3 si_interval = 60 si_retryinterval = 0x4456c0 si_retrynum_init = 0x449d80 si_retrynum = 0x449cd8 si_syncCookie = { ctxcsn = 0x506400 octet_str = { bv_len = 60U bv_val = 0x445348 "rid=004,sid=001,csn=20090610163816.141261Z#000000#001#000000" } rid = 4 sid = 1 numcsns = 1 sids = 0x445d90 sc_next = { stqe_next = (nil) } } si_cookieState = 0x41e878 si_cookieAge = 0 si_manageDSAit = 0 si_slimit = 0 si_tlimit = 0 si_refreshDelete = 0 si_refreshPresent = 0 si_refreshDone = 0 si_syncdata = 0 si_logstate = 0 si_got = 263443 si_msgid = 2 si_presentlist = (nil) si_ld = 0x44c288 si_conn = 0x42d1a8 si_nonpresentlist = { lh_first = (nil) } si_mutex = { __pthread_mutex_flags = { __pthread_mutex_flag1 = 4U __pthread_mutex_flag2 = '\0' __pthread_mutex_ceiling = '\0' __pthread_mutex_type = 0 __pthread_mutex_magic = 19800U } __pthread_mutex_lock = { __pthread_mutex_lock64 = { __pthread_mutex_pad = "" } __pthread_mutex_lock32 = { __pthread_ownerpid = 0 __pthread_lockword = 4278190080U } __pthread_mutex_owner64 = 4278190080ULL } __pthread_mutex_data = 4279503872ULL } }
Aaron Richton wrote:
Different assertion.
Current function is ldap_pvt_runqueue_resched current thread: t@7 [1] __lwp_kill(0x0, 0x6, 0x0, 0xff33c000, 0x0, 0x0), at 0xff31feb0 [2] raise(0x6, 0x0, 0xfbbff5d8, 0x0, 0x0, 0x0), at 0xff2d0b28 [3] abort(0x32, 0xfbbff668, 0x32, 0x7efefeff, 0x81010100, 0xff00), at 0xff2b6e70 [4] __assert(0xff1e7c00, 0xff1e7c0c, 0xa5, 0x2b2368, 0x4, 0x0), at 0xff2b7110 =>[5] ldap_pvt_runqueue_resched(rq = 0x39f298, entry = 0x501c80, defer = 1), line 165 in "rq.c" [6] do_syncrepl(ctx = 0xfbbffe0c, arg = 0x501c80), line 1424 in "syncrepl.c" [7] ldap_int_thread_pool_wrapper(xpool = 0x3eb0c0), line 663 in "tpool.c"
rq.c:165 assert ( e == entry );
Interesting. This assert is because it tried to reschedule a task that wasn't already on the task list. And the *si info in your other email indicates that this consumer's task pointer should have been (si->si_re), which is a completely different value. Of course, this code only ever gets triggered one of two ways - directly by the runqueue, or as a connection_client callback setup by a task that was running on the runqueue. Since this is a refreshAndPersist consumer, it's most likely running due to connection activity.
The *si pointer comes out of the rtask pointer. There's no way for the *si data to be valid while the rtask is invalid. (And *si is definitely valid.) And there's no way for the rtask to be valid without existing on the runqueue. Very strange.
print si
print *si->si_re
Also strange is that entry->routine is nil; the runqueue could not have invoked the do_syncrepl function without a value here. And the tname is empty, when it should be "do_syncrepl". Yet entry->tspec is valid.
Seems like a race with syncinfo_free()...
(dbx) print e e = (nil) (dbx) print *entry *entry = { next_sched = { tv_sec = 0 tv_usec = 0 } interval = { tv_sec = 60 tv_usec = 0 } tnext = { stqe_next = 0x448c70 } rnext = { stqe_next = (nil) } routine = (nil) arg = 0x4483b8 tname = 0x501c78 "" tspec = 0x4483cc "rid=004" }
backtrace and testrun: https://www.nbcs.rutgers.edu/~richton/testfailed.200906101238.tgz
Aaron Richton wrote:
Different assertion.
Please try updated HEAD.
Current function is ldap_pvt_runqueue_resched current thread: t@7 [1] __lwp_kill(0x0, 0x6, 0x0, 0xff33c000, 0x0, 0x0), at 0xff31feb0 [2] raise(0x6, 0x0, 0xfbbff5d8, 0x0, 0x0, 0x0), at 0xff2d0b28 [3] abort(0x32, 0xfbbff668, 0x32, 0x7efefeff, 0x81010100, 0xff00), at 0xff2b6e70 [4] __assert(0xff1e7c00, 0xff1e7c0c, 0xa5, 0x2b2368, 0x4, 0x0), at 0xff2b7110 =>[5] ldap_pvt_runqueue_resched(rq = 0x39f298, entry = 0x501c80, defer = 1), line 165 in "rq.c" [6] do_syncrepl(ctx = 0xfbbffe0c, arg = 0x501c80), line 1424 in "syncrepl.c" [7] ldap_int_thread_pool_wrapper(xpool = 0x3eb0c0), line 663 in "tpool.c"
rq.c:165 assert ( e == entry );
(dbx) print e e = (nil) (dbx) print *entry *entry = { next_sched = { tv_sec = 0 tv_usec = 0 } interval = { tv_sec = 60 tv_usec = 0 } tnext = { stqe_next = 0x448c70 } rnext = { stqe_next = (nil) } routine = (nil) arg = 0x4483b8 tname = 0x501c78 "" tspec = 0x4483cc "rid=004" }
backtrace and testrun: https://www.nbcs.rutgers.edu/~richton/testfailed.200906101238.tgz
On Wed, 10 Jun 2009, Howard Chu wrote:
Please try updated HEAD.
HEAD was updated as of Friday. In test054, I got the following. I'm not entirely sure about this -- I thought that (under the Solaris implementation) this can only happen if the mutex in question is uninitialized/previously destroyed.
t@1 a l@1 ?() LWP suspended in __lwp_wait() t@2 a l@2 slapd_daemon_task() LWP suspended in _poll() t@3 a l@3 ldap_int_thread_pool_wrapper() LWP suspended in _private_close() o> t@4 a l@4 ldap_int_thread_pool_wrapper() signal SIGABRT in __lwp_kill() t@5 a l@5 ldap_int_thread_pool_wrapper() sleep on 0x3fe4a8 in __lwp_park() t@6 a l@6 ldap_int_thread_pool_wrapper() sleep on 0x3fe4a8 in __lwp_park() Current function is ldap_pvt_thread_join 197 return ERRVAL( pthread_join( thread, thread_return ) ); t@1 (l@1) stopped in __lwp_wait at 0xff31ff68 0xff31ff68: __lwp_wait+0x0008: bgeu,a __lwp_wait+0x1c ! 0xff31ff7c current thread: t@1 [1] __lwp_wait(0x4, 0xffbff554, 0xff18fb04, 0xff1424fc, 0x1, 0xffbff51c), at 0xff31ff68 [2] lwp_wait(0x2, 0xffbff554, 0x2d388, 0xff1849e8, 0x5, 0xffbff54c), at 0xff14d1cc [3] _thrp_join(0x2, 0x0, 0x0, 0x1, 0x81010100, 0xff00), at 0xff1490c4 =>[4] ldap_pvt_thread_join(thread = 2U, thread_return = (nil)), line 197 in "thr_posix.c" [5] slapd_daemon(), line 2700 in "daemon.c" [6] main(argc = 8, argv = 0xffbff774), line 950 in "main.c" Current function is slapd_daemon_task 2325 SLAP_EVENT_WAIT( tvp, &ns ); t@2 (l@2) stopped in _poll at 0xff31e23c 0xff31e23c: _poll+0x0008: bgeu _poll+0x30 ! 0xff31e264 current thread: t@2 [1] _poll(0x4, 0x4, 0xffffffff, 0xfffffff8, 0x0, 0xfe3fbd99), at 0xff31e23c [2] select_large_fdset(0x12, 0x20, 0xfe3fbd98, 0x0, 0xfe3fbd98, 0xfe3fbd98), at 0xff2d2b6c =>[3] slapd_daemon_task(ptr = (nil)), line 2325 in "daemon.c" Current function is sb_stream_close 544 tcp_close( sbiod->sbiod_sb->sb_fd ); t@3 (l@3) stopped in _private_close at 0xff31d318 0xff31d318: _private_close+0x0008: bgeu _private_close+0x30 ! 0xff31d340 current thread: t@3 [1] _private_close(0x0, 0xff14d8b8, 0x20b28, 0x0, 0x0, 0x0), at 0xff31d318 [2] _ti_close(0x10, 0x2, 0x1, 0xff33c000, 0x14, 0x0), at 0xff14d8c0 =>[3] sb_stream_close(sbiod = 0x521c58), line 544 in "sockbuf.c" [4] ber_int_sb_close(sb = 0x5236e8), line 383 in "sockbuf.c" [5] ber_sockbuf_free(sb = 0x5236e8), line 74 in "sockbuf.c" [6] slapd_remove(s = 16, sb = 0x5236e8, wasactive = 1, wake = 0, locked = 0), line 905 in "daemon.c" [7] connection_destroy(c = 0x4612f8), line 690 in "connection.c" [8] connection_close(c = 0x4612f8), line 828 in "connection.c" [9] connection_resched(conn = 0x4612f8), line 1668 in "connection.c" [10] connection_operation(ctx = 0xfdbffe0c, arg_v = 0x525868), line 1167 in "connection.c" [11] connection_read_thread(ctx = 0xfdbffe0c, argv = 0x10), line 1248 in "connection.c" [12] ldap_int_thread_pool_wrapper(xpool = 0x3fe488), line 698 in "tpool.c" t@4 (l@4) stopped in __lwp_kill at 0xff31feb0 0xff31feb0: __lwp_kill+0x0008: bgeu,a __lwp_kill+0x1c ! 0xff31fec4 current thread: t@4 [1] __lwp_kill(0x0, 0x6, 0x0, 0xfffffff8, 0x0, 0xfd3ff241), at 0xff31feb0 [2] Abort(0xfd3ff298, 0xfd3ff298, 0x3a, 0x39, 0x81010100, 0xff00), at 0xff146f9c [3] panic(0xff155e34, 0x0, 0x0, 0x1, 0x0, 0xfd3ff4b9), at 0xff147094 [4] _ceil_prio_inherit(0xf0, 0x0, 0x549780, 0xff168000, 0xff1513bc, 0x0), at 0xff14ff10 [5] mutex_lock_internal(0x6d, 0x0, 0xff070600, 0xfffffff8, 0x0, 0xfc3351), at 0xff1514bc =>[6] ldap_pvt_thread_mutex_lock(mutex = 0x549780), line 296 in "thr_posix.c" [7] syncprov_op_mod(op = 0x52bd00, rs = 0xfd3ffcac), line 1965 in "syncprov.c" [8] overlay_op_walk(op = 0x52bd00, rs = 0xfd3ffcac, which = op_add, oi = 0x4326c8, on = 0x4327d0), line 659 in "backover.c" [9] over_op_func(op = 0x52bd00, rs = 0xfd3ffcac, which = op_add), line 721 in "backover.c" [10] over_op_add(op = 0x52bd00, rs = 0xfd3ffcac), line 767 in "backover.c" [11] fe_op_add(op = 0x52bd00, rs = 0xfd3ffcac), line 334 in "add.c" [12] do_add(op = 0x52bd00, rs = 0xfd3ffcac), line 194 in "add.c" [13] connection_operation(ctx = 0xfd3ffe0c, arg_v = 0x52bd00), line 1115 in "connection.c" [14] connection_read_thread(ctx = 0xfd3ffe0c, argv = 0x11), line 1248 in "connection.c" [15] ldap_int_thread_pool_wrapper(xpool = 0x3fe488), line 698 in "tpool.c" Current function is ldap_pvt_thread_cond_wait 277 return ERRVAL( pthread_cond_wait( cond, mutex ) ); t@5 (l@5) stopped in __lwp_park at 0xff1554b4 0xff1554b4: __lwp_park+0x0014: bgeu,a __lwp_park+0x28 ! 0xff1554c8 current thread: t@5 [1] __lwp_park(0x4, 0x0, 0x0, 0x1, 0xff168000, 0x0), at 0xff1554b4 [2] cond_wait_queue(0x3fe4a8, 0xff168c08, 0x0, 0x0, 0xff070800, 0xff168000), at 0xff1526b8 [3] _cond_wait_cancel(0x3fe4a8, 0x3fe490, 0x5256c8, 0xfcbff978, 0x1, 0xfcbff978), at 0xff152e74 [4] _pthread_cond_wait(0x3fe4a8, 0x3fe490, 0xfcbffe0c, 0x1, 0x0, 0xfcbffd81), at 0xff152eb0 =>[5] ldap_pvt_thread_cond_wait(cond = 0x3fe4a8, mutex = 0x3fe490), line 277 in "thr_posix.c" [6] ldap_int_thread_pool_wrapper(xpool = 0x3fe488), line 689 in "tpool.c" Current function is ldap_pvt_thread_cond_wait 277 return ERRVAL( pthread_cond_wait( cond, mutex ) ); t@6 (l@6) stopped in __lwp_park at 0xff1554b4 0xff1554b4: __lwp_park+0x0014: bgeu,a __lwp_park+0x28 ! 0xff1554c8 current thread: t@6 [1] __lwp_park(0x4, 0x0, 0x0, 0x1, 0xff168000, 0x0), at 0xff1554b4 [2] cond_wait_queue(0x3fe4a8, 0xff168c08, 0x0, 0x0, 0xff070a00, 0xff168000), at 0xff1526b8 [3] _cond_wait_cancel(0x3fe4a8, 0x3fe490, 0xfc3ffc98, 0x1, 0x3, 0x0), at 0xff152e74 [4] _pthread_cond_wait(0x3fe4a8, 0x3fe490, 0xfc3ffe0c, 0x1, 0x0, 0xfc3ffd81), at 0xff152eb0 =>[5] ldap_pvt_thread_cond_wait(cond = 0x3fe4a8, mutex = 0x3fe490), line 277 in "thr_posix.c" [6] ldap_int_thread_pool_wrapper(xpool = 0x3fe488), line 689 in "tpool.c"
On Wed, 10 Jun 2009, Howard Chu wrote:
Please try updated HEAD.
A test054 slapd livelocked:
t@1 a l@1 ?() running in __lwp_wait()
t@2 a l@2 slapd_daemon_task() running in _poll() t@3 a l@3 ldap_int_thread_pool_wrapper() sleep on 0x3fe4a8 in __lwp_park() t@4 a l@4 ldap_int_thread_pool_wrapper() sleep on 0x3fe4a8 in __lwp_park() t@5 a l@5 ldap_int_thread_pool_wrapper() running in lwp_yield() t@6 a l@6 ldap_int_thread_pool_wrapper() sleep on 0x3fe4a8 in __lwp_park() t@1 (l@1) stopped in __lwp_wait at 0xff31ff64 0xff31ff64: __lwp_wait+0x0004: ta %icc,0x00000008 current thread: t@1 [1] __lwp_wait(0x2, 0xffbff554, 0xff18fb04, 0xff1424fc, 0x1, 0xffbff51c), at 0xff31ff64 [2] lwp_wait(0x2, 0xffbff554, 0x2d388, 0xff1849e8, 0x5, 0xffbff54c), at 0xff14d1cc [3] _thrp_join(0x2, 0x0, 0x0, 0x1, 0x81010100, 0xff00), at 0xff1490c4 =>[4] ldap_pvt_thread_join(thread = 2U, thread_return = (nil)), line 197 in "thr_posix.c" [5] slapd_daemon(), line 2700 in "daemon.c" [6] main(argc = 8, argv = 0xffbff774), line 950 in "main.c" Current function is slapd_daemon_task 2325 SLAP_EVENT_WAIT( tvp, &ns ); t@2 (l@2) stopped in _poll at 0xff31e238 0xff31e238: _poll+0x0004: ta %icc,0x00000008 current thread: t@2 [1] _poll(0xfe3fbb90, 0x4, 0xffffffffffffffff, 0xfffffffffffffff8, 0x0, 0xfe3fbd99), at 0xff31e238 [2] select_large_fdset(0x12, 0x20, 0xfe3fbd98, 0x0, 0xfe3fbd98, 0xfe3fbd98), at 0xff2d2b6c =>[3] slapd_daemon_task(ptr = (nil)), line 2325 in "daemon.c" Current function is ldap_pvt_thread_cond_wait 277 return ERRVAL( pthread_cond_wait( cond, mutex ) ); t@3 (l@3) stopped in __lwp_park at 0xff1554b0 0xff1554b0: __lwp_park+0x0010: ta %icc,0x00000008 current thread: t@3 [1] __lwp_park(0x0, 0x0, 0x0, 0x1, 0xff168000, 0x0), at 0xff1554b0 [2] cond_wait_queue(0x3fe4a8, 0xff168c08, 0x0, 0x0, 0xff070400, 0xff168000), at 0xff1526b8 [3] _cond_wait_cancel(0x3fe4a8, 0x3fe490, 0xfdbffc98, 0x1, 0x3, 0x0), at 0xff152e74 [4] _pthread_cond_wait(0x3fe4a8, 0x3fe490, 0xfdbffe0c, 0x1, 0x0, 0xfdbffd81), at 0xff152eb0 =>[5] ldap_pvt_thread_cond_wait(cond = 0x3fe4a8, mutex = 0x3fe490), line 277 in "thr_posix.c" [6] ldap_int_thread_pool_wrapper(xpool = 0x3fe488), line 689 in "tpool.c" Current function is ldap_pvt_thread_cond_wait 277 return ERRVAL( pthread_cond_wait( cond, mutex ) ); t@4 (l@4) stopped in __lwp_park at 0xff1554b0 0xff1554b0: __lwp_park+0x0010: ta %icc,0x00000008 current thread: t@4 [1] __lwp_park(0x0, 0x0, 0x0, 0x1, 0xff168000, 0x0), at 0xff1554b0 [2] cond_wait_queue(0x3fe4a8, 0xff168c08, 0x0, 0x0, 0xff070600, 0xff168000), at 0xff1526b8 [3] _cond_wait_cancel(0x3fe4a8, 0x3fe490, 0x0, 0xff168000, 0x3, 0x0), at 0xff152e74 [4] _pthread_cond_wait(0x3fe4a8, 0x3fe490, 0xfd3ffe0c, 0x1, 0x0, 0xfd3ffd81), at 0xff152eb0 =>[5] ldap_pvt_thread_cond_wait(cond = 0x3fe4a8, mutex = 0x3fe490), line 277 in "thr_posix.c" [6] ldap_int_thread_pool_wrapper(xpool = 0x3fe488), line 689 in "tpool.c" Current function is ldap_pvt_thread_yield 228 thr_yield(); t@5 (l@5) stopped in lwp_yield at 0xff15556c 0xff15556c: lwp_yield+0x0008: retl current thread: t@5 [1] lwp_yield(0x0, 0x0, 0xff070800, 0x0, 0x0, 0x0), at 0xff15556c =>[2] ldap_pvt_thread_yield(), line 228 in "thr_posix.c" [3] syncprov_op_mod(op = 0x52c508, rs = 0xfcbffcac), line 1964 in "syncprov.c" [4] overlay_op_walk(op = 0x52c508, rs = 0xfcbffcac, which = op_add, oi = 0x4326c8, on = 0x4327d0), line 659 in "backover.c" [5] over_op_func(op = 0x52c508, rs = 0xfcbffcac, which = op_add), line 721 in "backover.c" [6] over_op_add(op = 0x52c508, rs = 0xfcbffcac), line 767 in "backover.c" [7] fe_op_add(op = 0x52c508, rs = 0xfcbffcac), line 334 in "add.c" [8] do_add(op = 0x52c508, rs = 0xfcbffcac), line 194 in "add.c" [9] connection_operation(ctx = 0xfcbffe0c, arg_v = 0x52c508), line 1115 in "connection.c" [10] connection_read_thread(ctx = 0xfcbffe0c, argv = 0x11), line 1248 in "connection.c" [11] ldap_int_thread_pool_wrapper(xpool = 0x3fe488), line 698 in "tpool.c" Current function is ldap_pvt_thread_cond_wait 277 return ERRVAL( pthread_cond_wait( cond, mutex ) ); t@6 (l@6) stopped in __lwp_park at 0xff1554b0 0xff1554b0: __lwp_park+0x0010: ta %icc,0x00000008 current thread: t@6 [1] __lwp_park(0x0, 0x0, 0x0, 0x1, 0xff168000, 0x0), at 0xff1554b0 [2] cond_wait_queue(0x3fe4a8, 0xff168c08, 0x0, 0x0, 0xff070a00, 0xff168000), at 0xff1526b8 [3] _cond_wait_cancel(0x3fe4a8, 0x3fe490, 0xfc3ffc98, 0x1, 0x3, 0x0), at 0xff152e74 [4] _pthread_cond_wait(0x3fe4a8, 0x3fe490, 0xfc3ffe0c, 0x1, 0x0, 0xfc3ffd81), at 0xff152eb0 =>[5] ldap_pvt_thread_cond_wait(cond = 0x3fe4a8, mutex = 0x3fe490), line 277 in "thr_posix.c" [6] ldap_int_thread_pool_wrapper(xpool = 0x3fe488), line 689 in "tpool.c"
Today's HEAD, bdb segv in test039.
Backtrace: https://www.nbcs.rutgers.edu/~richton/test039-bt-200906151148.txt
Aaron Richton wrote:
test050 assertion...
Current function is syncrepl_updateCookie syncrepl.c:3084 assert( !syn->ssyn_validate( syn, si->si_cookieState->cs_vals+i ));
backtrace and testrun directory: https://www.nbcs.rutgers.edu/~richton/testfailed.2009060519.tgz
In frame 5 print *si, *si->si_cookieState, i, si->si_cookieState->cs_vals[0..i]
On Wed, 10 Jun 2009, Howard Chu wrote:
In frame 5 print *si, *si->si_cookieState, i, si->si_cookieState->cs_vals[0..i]
*si = { si_next = 0xa4c380 si_be = 0x45a100 si_wbe = 0x45a100 si_re = 0x502720 si_rid = 11 si_ridtxt = "rid=011" si_bindconf = { sb_uri = { bv_len = 22U bv_val = 0x45d838 "ldap://localhost:9011/" } sb_version = 3 sb_tls = 0 sb_method = 128 sb_timeout_api = 3 sb_timeout_net = 0 sb_binddn = { bv_len = 28U bv_val = 0x45c528 "cn=manager,dc=example,dc=com" } sb_cred = { bv_len = 6U bv_val = 0x501848 "secret" } sb_saslmech = { bv_len = 0 bv_val = (nil) } sb_secprops = (nil) sb_realm = { bv_len = 0 bv_val = (nil) } sb_authcId = { bv_len = 0 bv_val = (nil) } sb_authzId = { bv_len = 0 bv_val = (nil) } sb_tls_ctx = (nil) sb_tls_cert = (nil) sb_tls_key = (nil) sb_tls_cacert = (nil) sb_tls_cacertdir = (nil) sb_tls_reqcert = (nil) sb_tls_cipher_suite = (nil) sb_tls_protocol_min = (nil) sb_tls_crlcheck = (nil) sb_tls_do_init = 0 } si_base = { bv_len = 17U bv_val = 0x45d858 "dc=example,dc=com" } si_logbase = { bv_len = 0 bv_val = (nil) } si_filterstr = { bv_len = 15U bv_val = 0xa4baf8 "(objectclass=*)" } si_logfilterstr = { bv_len = 0 bv_val = (nil) } si_scope = 2 si_attrsonly = 0 si_anfile = (nil) si_anlist = 0x45d7f8 si_exanlist = 0x45d818 si_attrs = 0xa4bb40 si_exattrs = (nil) si_allattrs = 0 si_allopattrs = 0 si_schemachecking = 0 si_type = 3 si_ctype = 3 si_interval = 60 si_retryinterval = 0xa4bb10 si_retrynum_init = 0xa4bb28 si_retrynum = 0xa4bae0 si_syncCookie = { ctxcsn = 0xaa4f90 octet_str = { bv_len = 60U bv_val = 0x455bf8 "rid=011,sid=001,csn=20090610091923.215407Z#000000#001#000000" } rid = 11 sid = 1 numcsns = 1 sids = 0xa3f238 sc_next = { stqe_next = (nil) } } si_cookieState = 0xa4c348 si_cookieAge = 4 si_manageDSAit = 0 si_slimit = 0 si_tlimit = 0 si_refreshDelete = 1 si_refreshPresent = 0 si_refreshDone = 1 si_syncdata = 0 si_logstate = 0 si_got = 263443 si_msgid = 2 si_presentlist = 0xa8b400 si_ld = 0x5066b0 si_conn = 0x42dd50 si_nonpresentlist = { lh_first = (nil) } si_mutex = { __pthread_mutex_flags = { __pthread_mutex_flag1 = 4U __pthread_mutex_flag2 = '\0' __pthread_mutex_ceiling = '\0' __pthread_mutex_type = 0 __pthread_mutex_magic = 19800U } __pthread_mutex_lock = { __pthread_mutex_lock64 = { __pthread_mutex_pad = "" } __pthread_mutex_lock32 = { __pthread_ownerpid = 0 __pthread_lockword = 4278190080U } __pthread_mutex_owner64 = 4278190080ULL } __pthread_mutex_data = 4279502848ULL } } *si->si_cookieState = { cs_mutex = { __pthread_mutex_flags = { __pthread_mutex_flag1 = 4U __pthread_mutex_flag2 = '\0' __pthread_mutex_ceiling = '\0' __pthread_mutex_type = 0 __pthread_mutex_magic = 19800U } __pthread_mutex_lock = { __pthread_mutex_lock64 = { __pthread_mutex_pad = "" } __pthread_mutex_lock32 = { __pthread_ownerpid = 0 __pthread_lockword = 0 } __pthread_mutex_owner64 = 0 } __pthread_mutex_data = 0 } cs_num = 3 cs_age = 5 cs_ref = 4 cs_vals = 0xaa34f0 cs_sids = 0xa97a30 } i = 0 si->si_cookieState->cs_vals[0] = { bv_len = 40U bv_val = 0xaa06a8 "20090610091923.215407Z#000000#001#000000" }
Aaron Richton wrote:
On Wed, 10 Jun 2009, Howard Chu wrote:
In frame 5 print *si, *si->si_cookieState, i, si->si_cookieState->cs_vals[0..i]
OK, no clue here. Your assert is from cs_vals[0] failing syntax validation, but the CSN looks perfectly valid. I have no idea why that would fail...
*si = { si_next = 0xa4c380 si_be = 0x45a100 si_wbe = 0x45a100 si_re = 0x502720 si_rid = 11 si_ridtxt = "rid=011" si_bindconf = { sb_uri = { bv_len = 22U bv_val = 0x45d838 "ldap://localhost:9011/" } sb_version = 3 sb_tls = 0 sb_method = 128 sb_timeout_api = 3 sb_timeout_net = 0 sb_binddn = { bv_len = 28U bv_val = 0x45c528 "cn=manager,dc=example,dc=com" } sb_cred = { bv_len = 6U bv_val = 0x501848 "secret" } sb_saslmech = { bv_len = 0 bv_val = (nil) } sb_secprops = (nil) sb_realm = { bv_len = 0 bv_val = (nil) } sb_authcId = { bv_len = 0 bv_val = (nil) } sb_authzId = { bv_len = 0 bv_val = (nil) } sb_tls_ctx = (nil) sb_tls_cert = (nil) sb_tls_key = (nil) sb_tls_cacert = (nil) sb_tls_cacertdir = (nil) sb_tls_reqcert = (nil) sb_tls_cipher_suite = (nil) sb_tls_protocol_min = (nil) sb_tls_crlcheck = (nil) sb_tls_do_init = 0 } si_base = { bv_len = 17U bv_val = 0x45d858 "dc=example,dc=com" } si_logbase = { bv_len = 0 bv_val = (nil) } si_filterstr = { bv_len = 15U bv_val = 0xa4baf8 "(objectclass=*)" } si_logfilterstr = { bv_len = 0 bv_val = (nil) } si_scope = 2 si_attrsonly = 0 si_anfile = (nil) si_anlist = 0x45d7f8 si_exanlist = 0x45d818 si_attrs = 0xa4bb40 si_exattrs = (nil) si_allattrs = 0 si_allopattrs = 0 si_schemachecking = 0 si_type = 3 si_ctype = 3 si_interval = 60 si_retryinterval = 0xa4bb10 si_retrynum_init = 0xa4bb28 si_retrynum = 0xa4bae0 si_syncCookie = { ctxcsn = 0xaa4f90 octet_str = { bv_len = 60U bv_val = 0x455bf8 "rid=011,sid=001,csn=20090610091923.215407Z#000000#001#000000" } rid = 11 sid = 1 numcsns = 1 sids = 0xa3f238 sc_next = { stqe_next = (nil) } } si_cookieState = 0xa4c348 si_cookieAge = 4 si_manageDSAit = 0 si_slimit = 0 si_tlimit = 0 si_refreshDelete = 1 si_refreshPresent = 0 si_refreshDone = 1 si_syncdata = 0 si_logstate = 0 si_got = 263443 si_msgid = 2 si_presentlist = 0xa8b400 si_ld = 0x5066b0 si_conn = 0x42dd50 si_nonpresentlist = { lh_first = (nil) } si_mutex = { __pthread_mutex_flags = { __pthread_mutex_flag1 = 4U __pthread_mutex_flag2 = '\0' __pthread_mutex_ceiling = '\0' __pthread_mutex_type = 0 __pthread_mutex_magic = 19800U } __pthread_mutex_lock = { __pthread_mutex_lock64 = { __pthread_mutex_pad = "" } __pthread_mutex_lock32 = { __pthread_ownerpid = 0 __pthread_lockword = 4278190080U } __pthread_mutex_owner64 = 4278190080ULL } __pthread_mutex_data = 4279502848ULL } } *si->si_cookieState = { cs_mutex = { __pthread_mutex_flags = { __pthread_mutex_flag1 = 4U __pthread_mutex_flag2 = '\0' __pthread_mutex_ceiling = '\0' __pthread_mutex_type = 0 __pthread_mutex_magic = 19800U } __pthread_mutex_lock = { __pthread_mutex_lock64 = { __pthread_mutex_pad = "" } __pthread_mutex_lock32 = { __pthread_ownerpid = 0 __pthread_lockword = 0 } __pthread_mutex_owner64 = 0 } __pthread_mutex_data = 0 } cs_num = 3 cs_age = 5 cs_ref = 4 cs_vals = 0xaa34f0 cs_sids = 0xa97a30 } i = 0 si->si_cookieState->cs_vals[0] = { bv_len = 40U bv_val = 0xaa06a8 "20090610091923.215407Z#000000#001#000000" }
Quanah Gibson-Mount wrote:
Please test RE24 as we prepare for 2.4.17. Thanks!
Testing now on AIX 5.3 with BDB 4.6 for 32-bit testing and BDB 4.7 for 64-bit. (There's no particular reason, my dev box just happens to be this way at the moment).
Any particular tests you would like hammered?
Cheers, Bill
--Quanah
--
Quanah Gibson-Mount Principal Software Engineer Zimbra, Inc
Zimbra :: the leader in open source messaging and collaboration
-----BEGIN PGP SIGNED MESSAGE----- Hash: SHA1
On Jun 8, 2009, at 15:29 , Quanah Gibson-Mount wrote:
Please test RE24 as we prepare for 2.4.17. Thanks!
On OS X 10.5.7/Intel I am getting the following:
Starting test056-monitor ...
running defines.sh Starting slapd on TCP/IP port ... Using ldapsearch to check that slapd is running... Using ldapsearch to read connection monitor entries... Filtering ldapsearch results... Comparing filter output... comparison failed - connection monitor output is not correct
./scripts/test056-monitor failed (exit 1)
However, in the testrun directory, the files "ldapsearch.flt" and "ldapsearch.out" show no differences. They contain the following:
dn: cn=Connection 1,cn=Connections,cn=Monitor structuralObjectClass: monitorConnection monitorConnectionProtocol: 3 monitorConnectionOpsReceived: 2 monitorConnectionOpsExecuting: 1 monitorConnectionOpsPending: 0 monitorConnectionOpsCompleted: 1 monitorConnectionGet: 2 monitorConnectionRead: 2 monitorConnectionWrite: 0 monitorConnectionMask: rx monitorConnectionListener: ldap://localhost:9011/ monitorConnectionLocalAddress: IP=[::1]:9011 entryDN: cn=Connection 1,cn=Connections,cn=Monitor
dn: cn=Connections,cn=Monitor structuralObjectClass: monitorContainer entryDN: cn=Connections,cn=Monitor
dn: cn=Current,cn=Connections,cn=Monitor structuralObjectClass: monitorCounterObject entryDN: cn=Current,cn=Connections,cn=Monitor
dn: cn=Max File Descriptors,cn=Connections,cn=Monitor structuralObjectClass: monitorCounterObject entryDN: cn=Max File Descriptors,cn=Connections,cn=Monitor
dn: cn=Total,cn=Connections,cn=Monitor structuralObjectClass: monitorCounterObject entryDN: cn=Total,cn=Connections,cn=Monitor
jens
Quanah Gibson-Mount quanah@zimbra.com writes:
Please test RE24 as we prepare for 2.4.17. Thanks!
test054 still failed, ITS#6094 and dup #6126
-Dieter
Dieter Kluenter wrote:
Quanah Gibson-Mountquanah@zimbra.com writes:
Please test RE24 as we prepare for 2.4.17. Thanks!
test054 still failed, ITS#6094 and dup #6126
Can you reproduce this with SYNC debugging enabled?
Howard Chu hyc@symas.com writes:
Dieter Kluenter wrote:
Quanah Gibson-Mountquanah@zimbra.com writes:
Please test RE24 as we prepare for 2.4.17. Thanks!
test054 still failed, ITS#6094 and dup #6126
Can you reproduce this with SYNC debugging enabled?
test054-log.tgz attached
-Dieter