hyc@symas.com wrote:
rein@OpenLDAP.org wrote:
I've had two cases where a delete operation was performed on the master without being replicated to its consumers, which so far appear to be cases of possible connection lost (abandon) race conditions. The log (level: stats) shows the "DEL" message of the entry, immediately followed by a "closed (connection lost)" message on the connection. Note: No "RESULT" message was logged.
I haven't looked very much into this, but my theory so far is that syncprov skipped replicating of the delete op after noticing the abandon resulting from loosing the connection, even though the delete had already taken place in the local database. That it happened after a delete op might very well have been a coincident, this possible race could exist after any modify op for all I know.
Do we need some sort of o_committed flag that can be used to prevent o_abandon from being set or acted upon? Or handle o_abandon more like o_cancel, i.e with multiple values, including "too late"?
No. What good can that do, since the connection has already been lost?
It doesn't matter if syncprov fails to send an update to a consumer - the consumer's cookie state will let it pick up where it left off when it reconnects.
It isn't the connection to the syncprov consumer that was lost, it is the connection to the client that made the change. The abandon may cause syncprov to abandon the modify op (in syncprov_op_abandon), and it will definitely cause the entire response callback to be skipped. Which is where syncprov sends updates to its consumers. The change will take place in the local database, but not be replicated, nor will auditlog log it. Accesslog does though, probably since the cleanup callback it enables in accesslog_op_mod explicitly calls its response callback.
The syncprov clients will receive updates to the csn when new modifications takes place, i.e the clients must be restarted with the "-c" option to resync their databases.
Rein