OK, I've fixed some of it in HEAD. See comments in the commits for tpool.c 1.63 and outwards.
RE23 has the same problems. Nearly identical code sans the new semaphore stuff and a few variable names, so fixes should be easy to import when they've been tested a bit.
Remaining problems:
tpool.c breaks if there are multiple pools. The simplest solution seems to be to remove support for multiple pools. The problem is thread_keys[]. It is shared between pools, but:
- ldap_pvt_thread_pool_purgekey() needs _all_ pools to be paused, but ldap_pvt_thread_pool_pause() only pauses one pool. If we keep multi-pool support, maybe it only should destroy keys from the current pool?
- ltp_max_count (max #threads) is pool-specific, but is used to protect from array bounds violation on thread_keys[LDAP_MAXTHR]. (New simple code from me, previously there was no limit.)
- This '#if 0'ed comment is also wrong with multiple pools: /* start up one thread, just so there is one. no need to * lock the mutex right now, since no threads are running. */ Time to remove the whole #if 0?
Other thread_keys[] strangeness:
- setkey does not free the old data with ltk_free. Possibly that's intentional, but if so that requires weird code elsewhere.
- maybe related to how getkey can return the ltk_free function - which seems crazy since a caller who uses it must then explicitly reset the key afterwards (or better, before), to be sure purgekey doesn't see the data and free it again.
Other notes:
- I've inserted new, possibly too aggressive, waits for ltp_pause in pool_wrapper. See the tpool.c 1.65 comment. Yet another alternative than splitting the ltp_pause state might be to remove the requirement that purgekey() can only be used by a paused thread. (Could loop until the key is _really_ gone instead, or something.)
- tpool.c uses LDAP_FREE(), can call ldap_pvt_thread_pool_context(), which uses thread_keys[]. If one thread does that when another thread has paused the pool, that can break since thread_keys[] is reserved for ldap_pvt_thread_pool_purgekey() during pauses.
Not sure if that can happen, haven't checked. If LDAP_MALLOC/LDAP_CALLOC also can do call pool_context(), we do have problem in pool_submit(). One fix is to use malloc/calloc/free instead. Another, in the case of LDAP_FREE(ctx) in pool_submit, is to put the ctx on the free list instead.
- ltp_pending_count includes threads sleeping in pool_pause() - should it? In effect, pool_pause() temporarily reduces ltp_max_pending.
- In ldap_pvt_thread_pool_submit():
This comment: /* there is another open thread, so this * context will be handled eventually. * continue on and signal that the context * is waiting. */ is wrong if the previous if (pool->ltp_open_count == 0) was taken but the no thread was found inside that.
Also if ctx is invalid (another thread handled it) and a new pending request has gotten the now-unused ctx, then the 'if (pool->ltp_open_count == 0)' code will free that new pending request instead of the old one.
One fix is to add an is_pending flag=1 in submit() and int *ltc_pendingp; in ldap_int_thread_ctx_t. In submit, if need_thread, set ltc_pendingp = &is_pending. In if(pool->ltp_open_count == 0) check is_pending instead of searching for ctx. Elsewhere set *ltc_pendingp = 0 and/or ltc_pendingp = NULL when appropriate.
- Note, I haven't looked at the new semaphore stuff yet.
Will get back to this later, but could use review so far. In particular I'd like to know if we'll keep or kill multi-pool support.