Rein Tollevik wrote:
hyc@symas.com wrote:
rein@OpenLDAP.org wrote:
I had a couple of seg. faults when resync'ing my servers after upgrading to the upcoming 2.4.16 release. Looks as if a copy of the backend must be used when testing the filter in syncprov_matchops. See the gdb output at the end. Note, some function names are incorrect due to optimization. A fix is coming.
The fix makes no sense, or the problem has not yet been analyzed sufficiently. Nobody in that call chain should be zeroing out bd_info. And if someone *is*, then it will happen in *whatever* BackendDB structure is currently being used.
Explain the real cause of the problem, and why the fix is correct.
The problem is not zeroing of bd_info, it is that the entire op2.o_bd points to garbage, as the gdb output shows. I did forgot to print ss->s_op->o_bd though. op2 is a copy of *ss->s_op, but op2.o_bd and ss->s_op->o_bd differ. The content of *ss->s_op->o_bd looks reasonable.
The copying of *ss->s_op into op2 was introduced in rev 1.233 as a fix to ITS#5486. It doesn't say why this was the correct fix, but I assume it was done because something could modify *ss->s_op while the filter was being tested.
Yes of course, particularly op->o_callback.
Btw, the gdb output from ITS#5486 shows a db with similar garbage, so I suspect that these ITSes are related.
Assuming that something could mess with ss->s_op they might as well mess with ss->s_op->o_bd. The copying of the op->o_bd that takes place all around is a nightmare! I have no clue as to who modified *ss->s_op and/or *ss->s_op->o_bd, and I'm not very satisfied with the fact that something did. Finding out why this happened may be the correct fix.
Yes, that's my point.
Down the road we need to fix things so that all this copying is unnecessary; it just involves adding an op->o_bd_info field so that we no longer need to change anything in the op->o_bd itself. But in the meantime, we need to find out why an invalid op->o_bd is there. Most likely some lower function temporarily put a stack'd copy in there and didn't restore the original value before returning. And again, if that's the case, then it doesn't matter what value you set higher up, copy or not it will still point to garbage.