Howard Chu writes:
Quite a lot of changes, will take some time to evaluate the overall impact.
I hope it helped to make one patch per issue, and that the comments were clear about what each patch was fixing.
The semaphore stuff probably needs to be axed/unifdef'd. It is
If it is unusable, do you mean "axed/#if 0'ed"?
unusable, it would only result in deadlocks, and I think I already removed the slapd code that would have invoked it.
slapd/daemon.c does ldap_lazy_sem_init(). No other calls outside tpool.c.
tpool.c breaks if there are multiple pools. The simplest solution seems to be to remove support for multiple pools. The problem is thread_keys[]. It is shared between pools, but:
slapd only uses a single pool, and that's really the only use case we care about. In fact it makes no sense to support multiple pools, because we have no mechanism to assert scheduling priorities of one pool over another.
I'm afraid I did not think of scheduling priorities at all, only bugs. But it certainly makes things simpler if we remove multiple pools.
Simple documentation of how tpool should be used may remove some of my concerns - or how it _is_ used, since it's only used in slapd. (Several probably only are problems if it is used as a library feature. tpool depends on being used the way slapd is using it. I've wondered what it's doing in libldap_r/ instead of in slapd/.)
I wrote:
- ltp_max_count (max #threads) is pool-specific, but is used to
protect from array bounds violation on thread_keys[LDAP_MAXTHR]. (New simple code from me, previously there was no limit.)
sorry, s/array bounds violation/eternal loop looking for free slot/.
Other thread_keys[] strangeness:
- setkey does not free the old data with ltk_free. Possibly that's intentional, but if so that requires weird code elsewhere.
Yes, that's intentional. ltk_free is only intended for cleanup if a key is still set by the time a thread exits. All of the keys we set are live for the entire life of a thread, they don't get re-assigned.
So if I call setkey and the key already exists and has an ltk_free function, it's my responsibility to know the old data was there, and free it or take back ownership? (A clarifying comment would be useful.)
- maybe related to how getkey can return the ltk_free function - which seems crazy since a caller who uses it must then explicitly reset the key afterwards (or better, before), to be sure purgekey doesn't see the data and free it again.
Nothing actually depends on returning the ltk_free function, that was just included for completeness' sake.
Then I suggest the getkey(,,,kfree) output parameter is removed, to avoid inviting to future duplicate frees. If anyone later needs it, a cleaner mechanism can be included - e.g. a setkey(data=kfree=0) variant which does call ltk_free before removing the key, or maybe a setkey which returns any old data and free function.
- tpool.c uses LDAP_FREE(), can call ldap_pvt_thread_pool_context(), which uses thread_keys[]. If one thread does that when another thread has paused the pool, that can break since thread_keys[] is reserved for ldap_pvt_thread_pool_purgekey() during pauses.
I'm wrong, thread_keys[] is _read-only_ during pauses. pool_context() is safe since it does not write thread_keys, just like pool_purgekey().
Since there is only one thread pool in use, it is impossible for another thread to be active.
No. Though it's true that my concern above was wrong.
It is "ltp_active_count--" which triggers the start of a pause. So "active" here means "included in ltp_active_count". A pool_wrapper() thread is only "active" when handling a context, not when adding itself to or removing itself from thread_keys[]. That's what my new pause waits were for.
Also I don't think the main thread is included in the "active" threads - so it's only impossible if the main thread doesn't interfere. (Again, I suppose a comment may be the "fix".)
But this reminds me: How much should an ltk_free() function be allowed to do? - pool_context_reset() calls ltk_free(). Should pool_wrapper() go active (wait for !pause, increment ltp_active_count, unlock ltp_mutex) before calling pool_context_reset()? - Should ltk_free() be allowed to change the contents of ltu_key[], so pool_purgekey() and pool_context_reset() should be prepared for that?
- ltp_pending_count includes threads sleeping in pool_pause() - should it? In effect, pool_pause() temporarily reduces ltp_max_pending.
I don't see why any special consideration is needed here. Nothing else in the thread pool is going to move while a pause is in effect.
That single thread could do ldap_pvt_thread_pool_submit() - which could fail due to the reduced effectinve max_pending. Again, if slapd won't, even with weird overlays and whatnot in effect, a comment will fix that.
In ldap_pvt_thread_pool_submit():
This comment: /* there is another open thread, so this * context will be handled eventually. * continue on and signal that the context * is waiting. */ is wrong if the previous if (pool->ltp_open_count == 0) was taken but the no thread was found inside that.
The comment may be incomplete...
"and signal that..." should be "we have signalled that...". Otherwise it's correct if we remove the remove the semaphore and unlock like you say below. With the current code, I think the comment should be /* another open thread took or eventually will * take care of this context. continue on, we * have signalled that the context is waiting. */
Also if ctx is invalid (another thread handled it) and a new pending request has gotten the now-unused ctx, then the 'if (pool->ltp_open_count == 0)' code will free that new pending request instead of the old one.
Most likely this is due to unlocking and relocking the mutex around the semaphore code. Axe that, remove the gratuitous unlock/relock, and then that problem goes away.
Yes, that should fix it.
- Note, I haven't looked at the new semaphore stuff yet.
Will get back to this later, but could use review so far. In particular I'd like to know if we'll keep or kill multi-pool support.
Thus far we've only had to worry about a single pool. If anyone can come up with a reason to need multiple pools, I guess we need to hear it. Given the lack of scheduling control, I don't see how it would be of any benefit. At the same time, I don't see a compelling reason to perturb the code at the moment.
Well, there are still bugs left. And can't say I like to have a feature (multiple pools) which only a not-bug because it isn't used.