Final anticipated fix is in for RE2.3.
Please test.
--Quanah
Hold off, latest fix broke cascaded syncrepl.
--Quanah
----- "Quanah Gibson-Mount" quanah@zimbra.com wrote:
Final anticipated fix is in for RE2.3.
Please test.
--Quanah
-- Quanah Gibson-Mount Principal Software Engineer Zimbra, Inc
Zimbra :: the leader in open source messaging and collaboration
Quanah Gibson-Mount wrote:
Hold off, latest fix broke cascaded syncrepl.
That's a side effect of ITS#5385. I think we should just revert the last RE23 patch; I don't have time to find the correct fix to backport for #5385.
--Quanah
----- "Quanah Gibson-Mount"quanah@zimbra.com wrote:
Final anticipated fix is in for RE2.3.
Please test.
--Quanah
-- Quanah Gibson-Mount Principal Software Engineer Zimbra, Inc
Zimbra :: the leader in open source messaging and collaboration
Reverted.
Tests w/o this patch passed some 10k times on delta-sync, and for all tests multiple times.
Shall I tag RE2.3.42?
--Quanah
----- "Howard Chu" hyc@symas.com wrote:
Quanah Gibson-Mount wrote:
Hold off, latest fix broke cascaded syncrepl.
That's a side effect of ITS#5385. I think we should just revert the last RE23 patch; I don't have time to find the correct fix to backport for #5385.
--Quanah
----- "Quanah Gibson-Mount"quanah@zimbra.com wrote:
Final anticipated fix is in for RE2.3.
Please test.
--Quanah
-- Quanah Gibson-Mount Principal Software Engineer Zimbra, Inc
Zimbra :: the leader in open source messaging and collaboration
-- -- Howard Chu CTO, Symas Corp. http://www.symas.com Director, Highland Sun http://highlandsun.com/hyc/ Chief Architect, OpenLDAP http://www.openldap.org/project/
All fine here.
Gavin Henry wrote:
All fine here.
All fine here too - tested on Ubuntu / i386 and Debian / amd64.
Regards,
-----BEGIN PGP SIGNED MESSAGE----- Hash: SHA1
All tests successful on FreeBSD/amd64 7.0-STABLE.
Cheers, - -- ** Help China's quake relief at http://www.redcross.org.cn/ |>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> Xin LI delphij@delphij.net http://www.delphij.net/ FreeBSD - The Power to Serve!
Livelocked in test008. I don't think there's any mutexes held anywhere in the process, but I'm about to leave my desk and that's only to a first glance. I'll poke at it more later to verify that statement...
https://www.nbcs.rutgers.edu/~richton/openldap-re23_20080515-livelock.txt
On Tue, 13 May 2008, Quanah Gibson-Mount wrote:
Final anticipated fix is in for RE2.3.
Please test.
--Quanah
-- Quanah Gibson-Mount Principal Software Engineer Zimbra, Inc
Zimbra :: the leader in open source messaging and collaboration
Aaron Richton wrote:
Livelocked in test008. I don't think there's any mutexes held anywhere in the process, but I'm about to leave my desk and that's only to a first glance. I'll poke at it more later to verify that statement...
https://www.nbcs.rutgers.edu/~richton/openldap-re23_20080515-livelock.txt
You've run into this sort of thing before, IIRC. At the moment no bright ideas come to mind. (That may be a combination of jetlag and beer more than anything else.)
In thread t@4, frame 3, can you print *cx and *cx->ei ? Thanks.
On Tue, 13 May 2008, Quanah Gibson-Mount wrote:
Final anticipated fix is in for RE2.3.
Please test.
--Quanah
-- Quanah Gibson-Mount Principal Software Engineer Zimbra, Inc
Zimbra :: the leader in open source messaging and collaboration
In thread t@4, frame 3, can you print *cx and *cx->ei ? Thanks.
BTW, definitely no locks held at the moment...
*cx = { bdb = 0x30d980 op = 0x7bc6d58 ei = 0x4978b0 ids = 0xfd37f9ac tmp = 0x10a5018 buf = 0x1125018 db = 0x3668d0 dbc = 0xfd33f7ec key = { data = (nil) size = 4U ulen = 4U dlen = 0 doff = 0 flags = 32U } data = { data = (nil) size = 0 ulen = 0 dlen = 0 doff = 0 flags = 0 } dbuf = 0 id = 5U nid = 5U rc = 0 depth = 1 need_sort = '\001' prefix = '@' }
*cx->ei = { bei_parent = 0x497858 bei_id = 5U bei_lockpad = 0 bei_state = 64 bei_nrdn = { bv_len = 34U bv_val = 0x37e9b0 "ou=information technology division" } bei_rdn = { bv_len = 34U bv_val = 0x3964b8 "ou=Information Technology Division" } bei_modrdns = 0 bei_ckids = 5 bei_dkids = 5 bei_e = 0x46b5f20 bei_kids = 0xfa1b48 bei_kids_mutex = { __pthread_mutex_flags = { __pthread_mutex_flag1 = 4U __pthread_mutex_flag2 = '\0' __pthread_mutex_ceiling = '\0' __pthread_mutex_type = 0 __pthread_mutex_magic = 19800U } __pthread_mutex_lock = { __pthread_mutex_lock64 = { __pthread_mutex_pad = "" } __pthread_mutex_lock32 = { __pthread_ownerpid = 0 __pthread_lockword = 0 } __pthread_mutex_owner64 = 0 } __pthread_mutex_data = 0 } bei_lrunext = 0x2caeb80 bei_lruprev = 0x69cf90 }
Aaron Richton wrote:
In thread t@4, frame 3, can you print *cx and *cx->ei ? Thanks.
BTW, definitely no locks held at the moment...
Ok...
Looks like there may be an unsafe access of the bei_state here in dn2id.c. How long does it take to reproduce this situation? Can you try testing with this patch?
diff -u -r1.106.2.18 dn2id.c --- dn2id.c 11 Feb 2008 23:24:19 -0000 1.106.2.18 +++ dn2id.c 16 May 2008 13:45:39 -0000 @@ -1152,7 +1152,11 @@ } cx->depth--; cx->op->o_tmpfree( save, cx->op->o_tmpmemctx ); - if ( nokids ) ei->bei_state |= CACHE_ENTRY_NO_GRANDKIDS; + if ( nokids ) { + bdb_cache_entryinfo_lock( ei ); + ei->bei_state |= CACHE_ENTRY_NO_GRANDKIDS; + bdb_cache_entryinfo_unlock( ei ); + } } /* Make sure caller knows it had kids! */ cx->tmp[0]=1;
*cx = { bdb = 0x30d980 op = 0x7bc6d58 ei = 0x4978b0 ids = 0xfd37f9ac tmp = 0x10a5018 buf = 0x1125018 db = 0x3668d0 dbc = 0xfd33f7ec key = { data = (nil) size = 4U ulen = 4U dlen = 0 doff = 0 flags = 32U } data = { data = (nil) size = 0 ulen = 0 dlen = 0 doff = 0 flags = 0 } dbuf = 0 id = 5U nid = 5U rc = 0 depth = 1 need_sort = '\001' prefix = '@' }
*cx->ei = { bei_parent = 0x497858 bei_id = 5U bei_lockpad = 0 bei_state = 64 bei_nrdn = { bv_len = 34U bv_val = 0x37e9b0 "ou=information technology division" } bei_rdn = { bv_len = 34U bv_val = 0x3964b8 "ou=Information Technology Division" } bei_modrdns = 0 bei_ckids = 5 bei_dkids = 5 bei_e = 0x46b5f20 bei_kids = 0xfa1b48 bei_kids_mutex = { __pthread_mutex_flags = { __pthread_mutex_flag1 = 4U __pthread_mutex_flag2 = '\0' __pthread_mutex_ceiling = '\0' __pthread_mutex_type = 0 __pthread_mutex_magic = 19800U } __pthread_mutex_lock = { __pthread_mutex_lock64 = { __pthread_mutex_pad = "" } __pthread_mutex_lock32 = { __pthread_ownerpid = 0 __pthread_lockword = 0 } __pthread_mutex_owner64 = 0 } __pthread_mutex_data = 0 } bei_lrunext = 0x2caeb80 bei_lruprev = 0x69cf90 }
Looks like there may be an unsafe access of the bei_state here in dn2id.c. How long does it take to reproduce this situation? Can you try testing with this patch?
Took a bit under two days of test008 in an infinite loop. I've got that compiling right now and will start the test back up once that's done...
Ran all weekend OK.
On Fri, 16 May 2008, Aaron Richton wrote:
Looks like there may be an unsafe access of the bei_state here in dn2id.c. How long does it take to reproduce this situation? Can you try testing with this patch?
Took a bit under two days of test008 in an infinite loop. I've got that compiling right now and will start the test back up once that's done...