Full_Name: Rein Tollevik Version: 2.4.16 OS: linux URL: Submission from: (NULL) (81.93.160.250) Submitted by: rein
I've had two cases where a delete operation was performed on the master without being replicated to its consumers, which so far appear to be cases of possible connection lost (abandon) race conditions. The log (level: stats) shows the "DEL" message of the entry, immediately followed by a "closed (connection lost)" message on the connection. Note: No "RESULT" message was logged.
I haven't looked very much into this, but my theory so far is that syncprov skipped replicating of the delete op after noticing the abandon resulting from loosing the connection, even though the delete had already taken place in the local database. That it happened after a delete op might very well have been a coincident, this possible race could exist after any modify op for all I know.
Do we need some sort of o_committed flag that can be used to prevent o_abandon from being set or acted upon? Or handle o_abandon more like o_cancel, i.e with multiple values, including "too late"?
Rein Tollevik Basefarm AS