Some more comments, review, planned changes, and questions.
I guess I should summarize the bugs I've fixed in HEAD first, for
completeness. (I was a bit too busy at the time.) Restating the CVS
log comments, nothing new but maybe a bit clearer. Possibly some of the
issues are tpool-only issues and can't happen the way slapd uses tpool.
----------------------------------------
rev 1.63:
ldap_pvt_thread_pool_submit(), after failed thread creation:
- ltp_pending_count++ should be --.
- should signal if there are no more threads, in case pool_destroy()
is waiting for that. (Not sure if that can happen.)
ldap_int_thread_pool_wrapper():
- The 'if(ltp_state...)' was spurious: It would always be true, and
if it were not we'd get an eternal loop.
rev 1.64:
pool_pause():
- would cancel states FINISHING and STOPPING, pool_resume() would set
it back to RUNNING.
- could hang when waiting for another pool_pause(), since it did not
signal ltp_pcond if it decremented ltp_active_count to 1.
pool_wrapper():
- could update thread_keys[] during a pause while initializing and
possibly while finishing,
- just after initializing, could handle a new ctx during a pause.
pool_destroy():
- would go on with destruction if pool_resume() was called, since it
used an if(){wait} instead of while() and they use the same cond.
rev 1.65:
- pool_context() read thread_keys[] without mutex protection.
pool_purgekey() too, but it expected a pause to protect the keys.
pool_wrapper() protected with ltp_mutex while writing, and did not
quite ensure a pause.
rev 1.66:
- thread_keys[] is a (poor) open-addessed hash table, but it lacked
a "deleted item" mark. So keys (contexts) could become hidden and
re-emerge later if threads exited and then new ones were started.
- TID_HASH() did hash calculation in signed arithmetic, thus depending
on friendly integer overflow.
rev 1.67:
- setkey/getkey expected ldap_int_thread_userctx_t.ltu_key[] to be
NULL-terminated when not full. purgekey() could set a NULL in
the middle, hiding later keys until a new key overwrote the NULL.
API changes:
- setkey(data=NULL, kfree!=NULL) searched as if intended to reset
the key, but updated by setting the key. Now always updates.
- setkey(key=<not found>, data=NULL) could return either success or
failure. Now succeeds iff (data == NULL && kfree == NULL).
rev 1.68:
- ltp_max_count <= 0 meant allow an unlimited number of threads, but
thread #LDAP_MAXTHR+1 and later would loop forever looking for a
free thread_keys[] slot. (Not fixed for multi-pool applications.)
----------------------------------------
I wrote in the tpool.c 1.66 -> 1.67 patch:
> API changes, need review - I'm not sure what's indented here:
> - setkey(data=NULL, kfree!=NULL) searched as if intended to reset
> the key, but updated by setting the key. Now always updates.
Right. That matches the one place it can happen: In bdb_locker_id()
with setkey(,,data=(void *)(long)lockid,). I don't know if lockid
can be 0, but it doesn't need to for data to be NULL.
But it would cleaner to ch_malloc an integer instead of the (void*)
cast. Then it won't depend on integer->pointer->integer conversions
preserving the value either.
BTW, is there a reason for the cast to long? lockid is u_int32_t,
and the getkey 'long' result is restored to a u_int32_t.
Howard Chu wrote:
>> Also if ctx is invalid (another thread handled it) and a new
>> pending request has gotten the now-unused ctx, then the
>> 'if (pool->ltp_open_count == 0)' code will free that new
>> pending request instead of the old one.
>
> Most likely this is due to unlocking and relocking the mutex around
> the semaphore code. Axe that, remove the gratuitous unlock/relock,
> and then that problem goes away.
Was that an instruction/permission to axe it or not?
If not I'll just move the unlock/lock into the #ifdef for now.
> Thus far we've only had to worry about a single pool. If anyone can
> come up with a reason to need multiple pools, I guess we need to
> hear it.
I'll insert "static int initialized; assert(!initialized++);"
in pool_init() and see if anyone complains.
>> - setkey does not free the old data with ltk_free. Possibly that's
>> intentional, but if so that requires weird code elsewhere.
>
> Yes, that's intentional. ltk_free is only intended for cleanup if a key
> is still set by the time a thread exits. All of the keys we set are live
> for the entire life of a thread, they don't get re-assigned.
Even the (void*)slap_sasl_bind key?
lutil_passwd() looks like it can be recursive: Register a password
scheme which finds an entry in LDAP and authenticates via that
entry's userPassword - which again could call lutil_passwd().
If that's legal, this code in be_isroot_pw()/slap_passwd_check()
can break:
pool_setkey(, slap_sasl_bind, <ctx>, <freefunc> );
... /* slap_passwd_check() does access_allowed() here */
lutil_passwd();
...
pool_setkey(, slap_sasl_bind, NULL, NULL);
The final setkey would need to restore the old key.
I'm not sure of contrib/slapd-modules/smbk5pwd/smbk5pwd.c + some
overlay either. rwm + back-relay which uses another smbk5pwd or
something. (And I have no idea of the BDB cache stuff, but since
you wrote it I assume you know:-)
If it's a problem, we can replace code like that with
void *data = <ctx>;
ldap_pvt_thread_pool_keyfree_t *kfree = <freefunc>;
pool_swapkey(, slap_sasl_bind, &data, &kfree ); /* new function */
lutil_passwd();
pool_setkey(, slap_sasl_bind, data, kfree );
or if setkey reverts to ignoring kfree when it checks whether we are
setting or deleting a key, the kfree function can be passed directly.
I wrote:
> But this reminds me: How much should an ltk_free() function be allowed
> to do?
> - pool_context_reset() calls ltk_free(). Should pool_wrapper() go
> active (wait for !pause, increment ltp_active_count, unlock ltp_mutex)
> before calling pool_context_reset()?
I think so. It makes slapi safer, and I don't see that it can hurt:
context_reset() can call connection_fake_destroy(), which via
slapi_int_free_object_extensions() calls slapi module destructors,
which can do who-knows-what.
Another stray thought: It would similarly be safer for context_reset()
and purgekey() to delete the key before calling ltk_free, so that
function can't access the key's data after freeing it. OTOH that would
break any slapi module whose ltk_free calls a function which uses the
key before freeing it. I'll leave it alone unless someone says otherwise.
Other matters:
pool->ltp_active_list is maintained but not used for anything.
Do I delete it? Or does tpool maintain the list to ease debugging?
Use of tpool:
Another bug:
Slapd does not always check if pool_submit() and pool_setkey() succeed,
even though failure is not a bug. (They can run into some limits.)
I do not understand the calls to ldap_pvt_thread_pool_context_reset()
in slapd (init.c and bconfig.c). That frees the thread's sl_malloc
memory with slap_sl_mem_destroy(), which seems rather hazardous until
the thread is really, really done.
For that matter, I guess that means context_reset() had better release
keys in the reverse order of how they were set. That usually happened,
but not always, and my clear_key_idx() made the "not always" more
common. So to close the gap from a deleted key I'll make it move all
later keys forward, instead of just moving the latest key to the gap.
Bdb uses the database environment address as a key to pool_setkey().
So there can be at most ~30 BDB databases with an active key at the
same time in the same thread. What does that mean - 30 overlays
all using bdb? Access control groups spanning 30 databases?
Nitpick: Slapd uses both function pointers and data pointers as keys to
pool_[sg]etkey(). That makes it possible for two "different" keys to
compare equal, on a host where functions and data live in different
segments that otherwise share the same address space. So I guess I
should clean up and replace (void*)<function> keys with the address of
static char <function>_key;
--
Regards,
Hallvard