Full_Name: Rein Tollevik
Version: 2.4.16
OS: linux
URL:
Submission from: (NULL) (81.93.160.250)
Submitted by: rein
I've had two cases where a delete operation was performed on the master without
being replicated to its consumers, which so far appear to be cases of possible
connection lost (abandon) race conditions. The log (level: stats) shows the
"DEL" message of the entry, immediately followed by a "closed (connection lost)"
message on the connection. Note: No "RESULT" message was logged.
I haven't looked very much into this, but my theory so far is that syncprov
skipped replicating of the delete op after noticing the abandon resulting from
loosing the connection, even though the delete had already taken place in the
local database. That it happened after a delete op might very well have been a
coincident, this possible race could exist after any modify op for all I know.
Do we need some sort of o_committed flag that can be used to prevent o_abandon
from being set or acted upon? Or handle o_abandon more like o_cancel, i.e with
multiple values, including "too late"?
Rein Tollevik
Basefarm AS