Howard Chu writes:
I'd prefer to condense it all into a bitfield.
Right... keep both o_abandon and o_cancel (maybe renamed to o_dummy) for RE24 though, otherwise we have a silent binary incompatible change with already-compiled slapd modules. The change will in any case break modules that play with o_cancel though.
We already ensure that the c_mutex is held when setting these flags.
Not yet, but c_mutex it is:-) Must protect the following code. Or rather, bitflag code which will replace it:
- o_cancel access in cancel.c, connection.c:connection_operation(), back-relay/op.c.
- o_abandon setting in overlays/syncprov.c:syncprov_op_abandon(), and in overlays/retcode.c where hopefully the first case should have been if ( op2.o_abandon ) op->o_abandon = 1; so that we can never reset o_abandon once it is set.
It would be a lot of overhead to grab the mutex just to read the flag, and would require additional analysis to make sure the new locking behavior doesn't introduce new deadlocks.
Checking o_abandon!=0 works unmutexed. (Well, technically any such unmutexed access is wrong, but in practice we should be OK.) As long as we: - never reset o_abandon like retcode above can do, and - never unmutexed assume consistency betweeen abandon-related variables set by different threads.
So the equivalent of this should be OK: if ( op->o_abandon ) rs->sr_err = SLAPD_ABANDON; ... if ( rs->sr_err == SLAPD_ABANDON ) { lock c_mutex; more careful o_abandon checks; unlock c_mutex; }
While I'm looking at this:
The ldap_pvt_thread_yield() in cancel.c is really ugly. There is no cond to attach to it to without breaking binary compatibility, but even a even a global mutex+cond might be better, with threads broadcasting on it when it might be relevant.
It's still ugly though. Cancelling an expensive operation ought to be a friendly operation, but instead it is unfriendly in that yet another thread is occupied, solely to wait for that operation. What would make sense (for RE25) would be to redo Cancel so that the cancelled thread is responsible for sending the Cancel response or resubmitting an operation which will.
Also, the be_cancel implementation in at least back-relay looks wrong. Cancel isn't dispatched on DN, and back-relay with no 'relay' directive configured cannot know where to dispatch. I haven't checked how it works, but it seems a call to be_cancel means the Cancel operation _might_ belong to this backend. But it might not, which means back-relay shouldn't set op->o_cancel in this case. What it should do I don't know though. And I don't know if syncprov is doing any better.