Howard Chu wrote:
Raphaël Ouazana-Sustowski wrote:
Hi,
> Starting test050-syncrepl-multimaster ...
running defines.sh Initializing server configurations... Starting producer slapd on TCP/IP port 9011... Using ldapsearch to check that producer slapd is running... Inserting syncprov overlay on producer... Starting consumer slapd on TCP/IP port 9012... Using ldapsearch to check that consumer slapd is running... Configuring syncrepl on consumer... Starting consumer2 slapd on TCP/IP port 9013... Using ldapsearch to check that consumer2 slapd is running... Configuring syncrepl on consumer2... Adding schema and databases on producer... Using ldapadd to populate producer... Waiting 20 seconds for syncrepl to receive changes... Using ldapadd to populate consumer... Waiting 20 seconds for syncrepl to receive changes... Using ldapsearch to check that syncrepl received database changes... Waiting 5 seconds for syncrepl to receive changes... Waiting 5 seconds for syncrepl to receive changes... Waiting 5 seconds for syncrepl to receive changes... Waiting 5 seconds for syncrepl to receive changes... Waiting 5 seconds for syncrepl to receive changes... Waiting 5 seconds for syncrepl to receive changes... ldapsearch failed (32)!
> ./scripts/test050-syncrepl-multimaster failed (exit 32)
make[2]: *** [hdb-yes] Erreur 32 make[2]: quittant le répertoire « /tmp/openldap/tests » make[1]: *** [test] Erreur 2 make[1]: quittant le répertoire « /tmp/openldap/tests » make: *** [test] Erreur 2
OK. Here's the apparent sequence of events: server1 starts up
database is defined, syncrepl consumer starts, fails, retries
dc=example entries start getting added
serverX consumer connects to server1 and starts receiving entries
server1 consumer starts up and connects to serverX
dc=example entries continue to be added on server1
The problem is that server1's consumer has snapped the ctxcsn while adds are ongoing, and serverX actually has a newer ctxcsn. E.g.:
entry1 is added on server1 server1 consumer gets ctxcsn for entry1 entry2 is added on server1 serverX consumer connects, gets entry1 and entry2, result ctxcsn2 server1 consumer connects to serverX, sends ctxcsn1 entry3-N is added on server1 serverX sends server1 a refreshResult with ctxcsn2, and a presentlist of just entry1 server1 performs a delete_nonpresent based on ctxcsn1, ctxcsn2, and the presentlist, even though it has newer data. All the newer entries are deleted...
The fix seems to be to have the consumer re-fetch the current ctxcsn before deciding whether to do a delete_nonpresent pass. The previous patch to syncprov.c was irrelevant and will be reverted.