On Wed, Aug 23, 2017 at 02:42:29PM +0100, Ond=C5=99ej Kuzn=C3=ADk wrote:
It is caused by the cookie not containing CSN and a race between the syncCookie check in do_syncrep2 and syncrepl_message_to_op. =20 This race is probably fine with plain syncrepl which is idempotent, but deltasync changes get their own dn in each accesslog instance and some can be applied twice unless we know how to find out we've already seen them - they need to mention the CSN. =20 The CSN itself gets lost on at least one occasion - when there's a checkpoint triggered. Not 100 % sure why the cookie gets eaten because of it, the op pointer is different between the syncprov_op_response tha=
t
calls syncprov_checkpoint and the one that decides CSN hasn't changed.
Yes, whenever a checkpoint happens, the syncCookie in cn=3Daccesslog only contains rid=3DXXX,sid=3DYYY. I thought that was because the checkpoint results in a new accesslog entry and that would be transmitted first, but that's not the case, there is no accesslog entry nor anything sent to the client (as observed by ldapsearch -E sync=3Drp).
I think it looks like this: syncproc_checkpoint modifies the suffix entry, that calls slap_graduate_commit_csn and the csn is removed from be_pending_csn_list. accesslog_response then can't find the CSN there and has nothing to insert into its own pending csn list. Strange that changing the overlay order (accesslog vs. syncprov) doesn't change this behaviour, something I'd expect if the above is the reason this happens.
--=20 Ond=C5=99ej Kuzn=C3=ADk Senior Software Engineer Symas Corporation http://www.symas.com Packaged, certified, and supported LDAP solutions powered by OpenLDAP