duncan.gibb@siriusit.co.uk wrote:
Duncan.Gibb@siriusit.co.uk wrote:
DG> Can anyone else reproduce this, or do we need to work on more examples?
After more experimentation, I believe this crash only happens if replication between servers 1 and 2 (master of the main tree and subordinate master respectively) has completed before server 3 (consumer of 1) is started.
We've scripted a test which works for me against 2.4.8 and did work against CVS HEAD until Howard committed
What does "work" mean? Please use more precise terminology. Did it reproduce the crash, or did it all replicate correctly? (The latter being the only thing that I would define as "working".)
openldap-src/servers/slapd/syncrepl.c 2008-03-19 23:26:40 +0000.
Use revision numbers, that's what they're for. There is nothing in CVS with the above timestamp.
That changes the syncrepl<-->glue interaction such that the data from server 2 is never replicated to server 1 (a pre-requisite for this crash). Maybe our config wrongly depends on the old behaviour. I'll look at that tomorrow.
Test rig (3.7K) is at
http://pastebin.siriusit.co.uk/openldap-its5430-scripted-test-2008-03-19.tar...
tar xzf openldap-its5430-scripted-test-2008-03-19.tar.gz cd openldap-its5430-scripted-test-2008-03-19 ./build.sh ./test.sh
Optionally copy CVS openldap-src from before the above commit into the directory before invoking build.sh.
Probably only works on Linux. Reproducible with bdb and hdb backends using BDB 4.4 on Debian Lenny/i386 (bare metal). Not reproducible with the ldif backend because replication from 2 to 1 never happens.