hyc@symas.com writes:
h.b.furuseth@usit.uio.no wrote:
- When a thread is finishing, make it go active for context_reset(). Should make slapi module destructors safer, and I don't see that it can hurt.
I don't understand this.
Let me dig a bit back in the ITS... "context_reset() can call connection_fake_destroy(), which via slapi_int_free_object_extensions() calls slapi module destructors, which can do who-knows-what."
- I'd like to rename thread contexts to tasks(?) and user contexts back to contexts (as in RE23). The current terminology is confusing.
I don't care about this either way.
Then I will. I still confuse myself sometimes, and I certainly would if I came back to this code next year.
(...)
Never use malloc unless absolutely necessary. We have enough problems with heap fragmentation already.
Aha. OK.
- Scheduling: If several threads call pool_pause(), then once there is a pause tpool does not schedule them all. They could get handled then, or another thread could undo the pause so tpool would wait to pause again. Is that deliberate?
I don't understand the question. The point of the pause is to prevent any other thread (in the pool) from running. Why should tpool schedule any other threads at this point?
Two threads call pool_pause(). Eventually the pool gets paused, and one pool_pause() call returns. When that thread calls pool_resume(), the other thread waiting in pool_pause() may or may not get scheduled.
- pool_context() breaks if several ldap_pvt_thread_t bit patterns can represent the same thread ID: TID_HASH() would give different hashes for the same thread, and pool_context() stops searching when it hits a slot with ctx == NULL. (My previous bug report was a bit wrong.)
How can a single thread ID be represented by multiple bit patterns?
A struct/union with padding bytes seems the least unlikely possibility.
A pointer in hardware where several address representations map to the same physical address, and the compiler handles that by normalizing pointers when they are compared/subtracted. Like DOS "huge" pointers would have been if the compiler normalized them when comparing them instead of when incrementing/decrementing them. 20-bit address bus, 32-bit pointers, physical address = 16 * <segment half of pointer> + <offset half of pointer>.
Or, not that I take this too seriously: A pthread implementation which uses e.g. a 32-bit integer type for thread IDs but just ignores the top 22 bits. The Posix spec seems to allow that, and thread IDs are to be compared with pthread_equal().
The best fix would be to use use real thread-specific data instead. Just one key with the ctx for now, that minimizes the changes. OTOH it also means we'll do thread-specific key lookup twice - first in pthread to get the ctx, and then in ltu_key[] to find our data. Anyway, I said I'd do that later but might as well get on with it, at least for pthreads. Except I don't know if that's OS-dependent enough that it should wait for RE24?
The best fix may be to just use the address of the thread's stack as the thread ID. That's actually perfect for our purpose.
How does a function called from somewhere inside the thread know that address? That's almost what the user context is, and thus what thread_keys[] maps the thread ID to.
- thread_keys is a poor hash table implementation with a poor hash function.
Ideally it would not be a hash table at all, but a direct lookup based on the thread ID.
If we can. See above. Anyway, I don't know why I keep kvetching about the hash table implementation. I'll just make it chained. As I finally figured out, that can be done without mallocs. Solves the multi-pool problem too.
- I wonder if there are long-lived ldap_pvt_thread_cond_wait(), [r]mutex_lock() or rdwr_[rw]lock() calls which should make the thread go inactive first (after waiting for any pause to finish), so that (a) a pool_pause() will not block waiting for them, and (b) all threads in the pool can't be blocked in long waits without tpool knowing about it.
Long-lived lock calls would be a quite noticeable bug. This seems to be a non-issue.
Well, it'd have to be with code which is very rarely executed, or unusual configurations. But yes, I feel relaxed enough about it.
- back-bdb/tools.c uses pool_submit() to start long-lived threads: (...)
Tool threads are completely separate from slapd threads. This is not an issue.
Fine. I don't understand that in this context, but I'll take your word for it:-)
slapd/bconfig.c maintains its own idea of the max number of threads in connection_pool_max and slap_tool_thread_max, which does not match tpool's idea if the limit is <=0 or larger than LDAP_MAXTHR.
Since slapd acts on its own count, I imagine that can cause breakage. Simplest fix might be to set max #threads and then read it back, I haven't checked. Would need to return LDAP_MAXTHR instead of 0 for "as many as allowed". Note however also the issue above: tpool's idea of actually available threads can be too high too.
The practical limits are actually pretty low. Even if you're on a 1024 processor machine with many terabytes of RAM, it's not a good idea to have many thousands of threads... Scheduler overhead would overshadow any useful work.
OK. I forgot how much RAM a slapd thread requires. But then it won't hurt to give slapd a max limit lower than LDAP_MAXTHR either.
- I said my new ltp_pause waits might be too aggressive. Maybe I should clarify: I don't see any instances where they can be dropped, but it's possible that slapd expects pauses to be less aggressive somewhere.
I don't understand this comment.
Clarifying the clarification: I don't know slapd well enough to know if the extra pauses do something bad to it.
Oh, the comment /* no open threads at all?!? */: That's not strange. Happens in the first pool_submit() call(s) if thread_create() fails. Though in that case there was no danger of removing the wrong ctx from ltp_pending_list after all (fixed in rev 1.63).
(Sure, but thread_create() should never fail if you have a sane configuration.)
And if the rest of the machine doesn't run wild. I imagine thread_create() can fail when fork() can fail on some OSes.