Raphaël Ouazana-Sustowski wrote:
Hi,
Le Ven 2 mai 2008 11:01, hyc@symas.com a écrit :
luca@OpenLDAP.org wrote:
luca@OpenLDAP.org wrote:
This is a multi-part message in MIME format. --------------080809000906010300090306 Content-Type: text/plain; charset=ISO-8859-1; format=flowed Content-Transfer-Encoding: 7bit
Howard Chu wrote:
Thanks. Please try HEAD again.
No way. new testrun directory in ftp://ftp.sys-net.it/luca_scamoni_its5470_20080430-new.tgz
backtrace attached
recent commits seem to have fixed it (at least, right now I'm not able to reproduce it anymore...)
Right. Confirmed here too; I (temporarily) added an assert(0) to the offending branch of code to make sure the patch was actually getting hit. It takes a very particular timing to trigger that code path.
I'm not sure how we can reliably test for this down the road. Perhaps we should add a "disabled" config keyword for backends and syncrepl consumers, so that we can start up the individual servers, (which takes an unpredictable amount of time for each) and then enable various parts in a fixed sequence (e.g. 1 second sleeps between ldapmodify/enable requests). Even that's hit or miss, because our test database is so small it's unlikely that we can hit the window of time on demand.
I'm testing the last RE24 tag. After 201 successful runs of test050, I got a failure :/ Cleaning up test run directory leftover from previous run. Running ./scripts/test050-syncrepl-multimaster... running defines.sh Initializing server configurations... Starting producer slapd on TCP/IP port 9011... Using ldapsearch to check that producer slapd is running... Inserting syncprov overlay on producer... Starting consumer slapd on TCP/IP port 9012... Using ldapsearch to check that consumer slapd is running... Configuring syncrepl on consumer... Starting consumer2 slapd on TCP/IP port 9013... Using ldapsearch to check that consumer2 slapd is running... Configuring syncrepl on consumer2... Adding schema and databases on producer... Using ldapadd to populate producer... Waiting 20 seconds for syncrepl to receive changes... Using ldapadd to populate consumer... Waiting 20 seconds for syncrepl to receive changes... Using ldapsearch to check that syncrepl received database changes... Waiting 5 seconds for syncrepl to receive changes... Waiting 5 seconds for syncrepl to receive changes... Waiting 5 seconds for syncrepl to receive changes... Waiting 5 seconds for syncrepl to receive changes... Waiting 5 seconds for syncrepl to receive changes... Waiting 5 seconds for syncrepl to receive changes... ldapsearch failed (32)!
testrun uploaded in ftp://ftp.openldap.org/incoming/raphael-ouazana-testrun-080505.tgz
The logs show that the syncrepl consumers all timed out periodically, when trying to bind to a provider. It seems that using a 1 second timeout in the syncrepl configs is too short, or your test machine was too slow during that run.
Probably we should remove that timeout now, since the cn=config/thread pause issue has already been resolved.