Re: (ITS#6276) paused pool can deadlock if writers are waiting - openldap-bugs

26 Aug 2009


      Howard Chu writes:
...
Hallvard B Furuseth wrote:
...
Beyond that, my immediate reaction was that pauses are implemented at
the wrong level and/or need to be split up in different types of pauses.
There is no good reason cn=config's need to have slapd config variables
for itself to affect network operations - nor, I hope, affect pool-level
operations like pool_purgekey().
I suppose implementing things this way can appear to be too blunt. But the 
amount of locks required to do it at a finer level is, IMO, unmanageable.
I know.  That's why I'm not suggesting to do away with pauses, only to
make them a little less crude than "1 type of pause for everything".
See below.
...
configurations are changed from under them.)
...
E.g. cn=config could use a slapd_config_pause() call to ask for lone
access to the config variables instead of thread_pool_pause() call.
Slapd operations that must respect such pauses should thus call a
slapd_config_pausecheck() macro/function, not thread_pool_pausecheck()
or depend on the pool to respect slapd-level pauses.
connection_read_thread() can then go ahead and use the connection
independently of those pauses.  If it reaches connection_operation(),
it'll need to check for slapd pause.
Yes, I've thought about this approach too. That makes the pause mechanisms 
even messier.
It does?  I was hoping that would be the cleaner solution.
Lifting them from pool level to slapd level would mean that every task
now submitted to the pool will be responsible for (a) calling pausecheck
itself or (b) avoiding all variables/features that today need pauses.
Then we could add the option (c) a *small* set of (groups of?) config
variables which each will need their own mutex or pause or something.
Such as loglevel, since the code is doing Debug() all over the place.
And presumably some network variables, for this ITS.  But I quite agree
that this is hopeless if that "small" set cannot be kept small.  For one
thing it might introduce yet another "lock order", threatening to be
inconsistent with the lock order of other mutexes.
...
Another possibility is to just try to read 1 byte in the listener
thread, to detect the hangup there when we have no other means to
discover it. We would have to make sure to be able to unget this byte
back to the bottom of the sockbuf stack if there's valid data. This
will affect the throughput of the listener thread, but it may not be
too terrible a hit.
Does that help for a socket which has gotten blocked on select() due to
full socket buffers?
Also if doing - a read() OS call anyway, why not read() a large enough
chunk that it would commonly be a full PDU?  As long as it's read to a
sockbuf buffer rather than into slapd's data.
-- 
Hallvard