Hi Quanah,
Thank you for the considerations. We have systems in place monitoring replication state 24/7. The consumers are checked every minute to ensure they have a very close CSN to the provider CSNs. Manually searching the context CSN confirmed that the CSNs on the provider were newer than the CSN used to re-establish the persistent connection. It seems just the sync-repl service, when re-establishing the dropped delta sync refreshAndPersist connection, is using an old CSN even though the consumer has an updated CSN.
Another example of this occurred today.
Re-establishing persistent connection after provider was restarted.
Jul 6 22:57:37 openldap-hdb-consumer-1 slapd[5749]: do_syncrep1: rid=002 starting refresh (sending cookie=rid=002,csn=20210331192036.214412Z#000000#000#000000;20210119225955.133811Z#000000#001#000000;20210128213906.596429Z#000000#002#000000;20210226190704.219043Z#000000#005#000000;20210412181659.152626Z#000000#065#000000;20210623181421.795352Z#000000#066#000000;20210706153144.905110Z#000000#44d#000000;20210412175600.595586Z#000000#835#000000;20210423182110.684843Z#000000#836#000000;20210331193249.570935Z#000000#ce5#000000)
We see 20210623181421.795352Z#000000#066#000000
Then immediately after a systemctl restart slapd, upon establishing the first connection after restart -
Jul 6 22:57:38 openldap-hdb-consumer-1 slapd[23892]: do_syncrep1: rid=002 starting refresh (sending cookie=rid=002,csn=20210331192036.214412Z#000000#000#000000;20210119225955.133811Z#000000#001#000000;20210128213906.596429Z#000000#002#000000;20210226190704.219043Z#000000#005#000000;20210412181659.152626Z#000000#065#000000;20210706225509.538827Z#000000#066#000000;20210706153144.905110Z#000000#44d#000000;20210412175600.595586Z#000000#835#000000;20210423182110.684843Z#000000#836#000000;20210331193249.570935Z#000000#ce5#000000)
20210706225509.538827Z#000000#066#000000
We see after the system reboots sync repl establishes the connection with a more up to date CSN. It's almost as if sync repl cache's the persistent search message and re-uses it later without checking if the CSNs are updated when it re-establishes the connection.
Does that seem possible?