rein@OpenLDAP.org writes:
I've had two cases where a delete operation was performed on the master without being replicated to its consumers, which so far appear to be cases of possible connection lost (abandon) race conditions.
Not sure if this is the problem, but it is ugly: slapd/cancel.c sets o_abandon with op->o_conn->c_mutex locked, but waits to set o_cancel after it's unlocked. Looks like that can give slapd a chance to react to o_abandon before it "knows" that abandon is actually a cancel.
Do we need some sort of o_committed flag that can be used to prevent o_abandon from being set or acted upon? Or handle o_abandon more like o_cancel, i.e with multiple values, including "too late"?
o_cancel is a wrapper around o_abandon, turning result code SLAPD_ABANDON into LDAP_TOO_LATE etc. However slap_send_ldap_result() and send_ldap_response() skip "if (op->o_callback) slap_response_play()" if o_abandon is set, and "send" SLAPD_ABANDON instead of the result code. Can that work right? The code looks like SLAPD_ABANDON ought to mean "nothing was done" right up till everything has had a chance to react the same way to an operation.