Re: (ITS#6059) Abandon syncprov race condition? - openldap-bugs

11 May 2009


      hyc@symas.com wrote:
...
rein@OpenLDAP.org wrote:
...
...
I've had two cases where a delete operation was performed on the master without
being replicated to its consumers, which so far appear to be cases of possible
connection lost (abandon) race conditions.  The log (level: stats) shows the
"DEL" message of the entry, immediately followed by a "closed (connection lost)"
message on the connection.  Note: No "RESULT" message was logged.
I haven't looked very much into this, but my theory so far is that syncprov
skipped replicating of the delete op after noticing the abandon resulting from
loosing the connection, even though the delete had already taken place in the
local database.  That it happened after a delete op might very well have been a
coincident, this possible race could exist after any modify op for all I know.
...
Do we need some sort of o_committed flag that can be used to prevent o_abandon
from being set or acted upon? Or handle o_abandon more like o_cancel, i.e with
multiple values, including "too late"?
No. What good can that do, since the connection has already been lost?
It doesn't matter if syncprov fails to send an update to a consumer - the 
consumer's cookie state will let it pick up where it left off when it reconnects.
It isn't the connection to the syncprov consumer that was lost, it is 
the connection to the client that made the change.  The abandon may 
cause syncprov to abandon the modify op (in syncprov_op_abandon), and it 
will definitely cause the entire response callback to be skipped.  Which 
is where syncprov sends updates to its consumers.  The change will take 
place in the local database, but not be replicated, nor will auditlog 
log it.  Accesslog does though, probably since the cleanup callback it 
enables in accesslog_op_mod explicitly calls its response callback.
The syncprov clients will receive updates to the csn when new 
modifications takes place, i.e the clients must be restarted with the 
"-c" option to resync their databases.
Rein