h.b.furuseth@usit.uio.no wrote:
OK, I've fixed some of it in HEAD. See comments in the commits for tpool.c 1.63 and outwards.
Quite a lot of changes, will take some time to evaluate the overall impact.
RE23 has the same problems. Nearly identical code sans the new semaphore stuff and a few variable names, so fixes should be easy to import when they've been tested a bit.
The semaphore stuff probably needs to be axed/unifdef'd. It is unusable, it would only result in deadlocks, and I think I already removed the slapd code that would have invoked it.
Remaining problems:
tpool.c breaks if there are multiple pools. The simplest solution seems to be to remove support for multiple pools. The problem is thread_keys[]. It is shared between pools, but:
slapd only uses a single pool, and that's really the only use case we care about. In fact it makes no sense to support multiple pools, because we have no mechanism to assert scheduling priorities of one pool over another.
Other thread_keys[] strangeness:
- setkey does not free the old data with ltk_free. Possibly that's intentional, but if so that requires weird code elsewhere.
Yes, that's intentional. ltk_free is only intended for cleanup if a key is still set by the time a thread exits. All of the keys we set are live for the entire life of a thread, they don't get re-assigned.
- maybe related to how getkey can return the ltk_free function - which seems crazy since a caller who uses it must then explicitly reset the key afterwards (or better, before), to be sure purgekey doesn't see the data and free it again.
Nothing actually depends on returning the ltk_free function, that was just included for completeness' sake.
Other notes:
- I've inserted new, possibly too aggressive, waits for ltp_pause in pool_wrapper. See the tpool.c 1.65 comment. Yet another alternative than splitting the ltp_pause state might be to remove the requirement that purgekey() can only be used by a paused thread. (Could loop until the key is _really_ gone instead, or something.)
Will look at this later.
- tpool.c uses LDAP_FREE(), can call ldap_pvt_thread_pool_context(), which uses thread_keys[]. If one thread does that when another thread has paused the pool, that can break since thread_keys[] is reserved for ldap_pvt_thread_pool_purgekey() during pauses.
Since there is only one thread pool in use, it is impossible for another thread to be active.
- ltp_pending_count includes threads sleeping in pool_pause() - should it? In effect, pool_pause() temporarily reduces ltp_max_pending.
I don't see why any special consideration is needed here. Nothing else in the thread pool is going to move while a pause is in effect.
In ldap_pvt_thread_pool_submit():
This comment: /* there is another open thread, so this * context will be handled eventually. * continue on and signal that the context * is waiting. */ is wrong if the previous if (pool->ltp_open_count == 0) was taken but the no thread was found inside that.
The comment may be incomplete...
Also if ctx is invalid (another thread handled it) and a new pending request has gotten the now-unused ctx, then the 'if (pool->ltp_open_count == 0)' code will free that new pending request instead of the old one.
Most likely this is due to unlocking and relocking the mutex around the semaphore code. Axe that, remove the gratuitous unlock/relock, and then that problem goes away.
- Note, I haven't looked at the new semaphore stuff yet.
Will get back to this later, but could use review so far. In particular I'd like to know if we'll keep or kill multi-pool support.
Thus far we've only had to worry about a single pool. If anyone can come up with a reason to need multiple pools, I guess we need to hear it. Given the lack of scheduling control, I don't see how it would be of any benefit. At the same time, I don't see a compelling reason to perturb the code at the moment.