Re: (ITS#4943) tpool.c pause vs. finish

9 Jun 2007

      hyc@symas.com writes:
...
h.b.furuseth@usit.uio.no wrote:
...

When a thread is finishing, make it go active for context_reset().
Should make slapi module destructors safer, and I don't see that it
can hurt.

I don't understand this.
Let me dig a bit back in the ITS...
  "context_reset() can call connection_fake_destroy(), which via
  slapi_int_free_object_extensions() calls slapi module destructors,
  which can do who-knows-what."
...
...

I'd like to rename thread contexts to tasks(?) and user contexts back
to contexts (as in RE23).  The current terminology is confusing.

I don't care about this either way.
Then I will.  I still confuse myself sometimes, and I certainly would if
I came back to this code next year.
...
...
(...)
Never use malloc unless absolutely necessary. We have enough problems
with heap fragmentation already.
Aha.  OK.
...
...

Scheduling: If several threads call pool_pause(), then once there is a
pause tpool does not schedule them all.  They could get handled then,
or another thread could undo the pause so tpool would wait to pause
again.  Is that deliberate?

I don't understand the question. The point of the pause is to prevent
any other thread (in the pool) from running. Why should tpool schedule
any other threads at this point?
Two threads call pool_pause().  Eventually the pool gets paused, and one
pool_pause() call returns.  When that thread calls pool_resume(), the
other thread waiting in pool_pause() may or may not get scheduled.
...
...

pool_context() breaks if several ldap_pvt_thread_t bit patterns can
represent the same thread ID: TID_HASH() would give different hashes
for the same thread, and pool_context() stops searching when it hits
a slot with ctx == NULL.  (My previous bug report was a bit wrong.)

How can a single thread ID be represented by multiple bit patterns?
A struct/union with padding bytes seems the least unlikely possibility.
A pointer in hardware where several address representations map to the
same physical address, and the compiler handles that by normalizing
pointers when they are compared/subtracted.  Like DOS "huge" pointers
would have been if the compiler normalized them when comparing them
instead of when incrementing/decrementing them.  20-bit address bus,
32-bit pointers, physical address = 16 * <segment half of pointer> +
<offset half of pointer>.
Or, not that I take this too seriously: A pthread implementation which
uses e.g. a 32-bit integer type for thread IDs but just ignores the top
22 bits.  The Posix spec seems to allow that, and thread IDs are to be
compared with pthread_equal().
...
...
The best fix would be to use use real thread-specific data instead.
Just one key with the ctx for now, that minimizes the changes.  OTOH
it also means we'll do thread-specific key lookup twice - first in
pthread to get the ctx, and then in ltu_key[] to find our data.
Anyway, I said I'd do that later but might as well get on with it,
at least for pthreads.  Except I don't know if that's OS-dependent
enough that it should wait for RE24?

The best fix may be to just use the address of the thread's stack as the
thread ID. That's actually perfect for our purpose.
How does a function called from somewhere inside the thread know that
address?  That's almost what the user context is, and thus what
thread_keys[] maps the thread ID to.
...
...

thread_keys is a poor hash table implementation with a poor hash
function.

Ideally it would not be a hash table at all, but a direct lookup based
on the thread ID.
If we can.  See above.  Anyway, I don't know why I keep kvetching about
the hash table implementation.  I'll just make it chained.  As I finally
figured out, that can be done without mallocs.  Solves the multi-pool
problem too.
...
...

I wonder if there are long-lived ldap_pvt_thread_cond_wait(),
[r]mutex_lock() or rdwr_[rw]lock() calls which should make the thread
go inactive first (after waiting for any pause to finish), so that
(a) a pool_pause() will not block waiting for them, and
(b) all threads in the pool can't be blocked in long waits without
    tpool knowing about it.

Long-lived lock calls would be a quite noticeable bug. This seems to be
a non-issue.
Well, it'd have to be with code which is very rarely executed, or unusual
configurations.  But yes, I feel relaxed enough about it.
...
...

back-bdb/tools.c uses pool_submit() to start long-lived threads:
(...)

Tool threads are completely separate from slapd threads. This is not an
issue.
Fine.  I don't understand that in this context, but I'll take your word
for it:-)
...
...

slapd/bconfig.c maintains its own idea of the max number of threads
in connection_pool_max and slap_tool_thread_max, which does not match
tpool's idea if the limit is <=0 or larger than LDAP_MAXTHR.
Since slapd acts on its own count, I imagine that can cause breakage.
Simplest fix might be to set max #threads and then read it back,
I haven't checked.  Would need to return LDAP_MAXTHR instead of 0
for "as many as allowed".  Note however also the issue above:
tpool's idea of actually available threads can be too high too.

The practical limits are actually pretty low. Even if you're on a 1024
processor machine with many terabytes of RAM, it's not a good idea to
have many thousands of threads... Scheduler overhead would overshadow
any useful work.
OK.  I forgot how much RAM a slapd thread requires.  But then it won't
hurt to give slapd a max limit lower than LDAP_MAXTHR either.
...
...

I said my new ltp_pause waits might be too aggressive.  Maybe I should
clarify: I don't see any instances where they can be dropped, but it's
possible that slapd expects pauses to be less aggressive somewhere.

I don't understand this comment.
Clarifying the clarification: I don't know slapd well enough to know if
the extra pauses do something bad to it.
...
...
Oh, the comment /* no open threads at all?!? */:  That's not strange.
  Happens in the first pool_submit() call(s) if thread_create() fails.
  Though in that case there was no danger of removing the wrong ctx
  from ltp_pending_list after all (fixed in rev 1.63).
(Sure, but thread_create() should never fail if you have a sane
configuration.)
And if the rest of the machine doesn't run wild.  I imagine
thread_create() can fail when fork() can fail on some OSes.
-- 
Regards,
Hallvard

2025

2024

2023

2022

2021

2020

2019

2018

2017

2016

2015

2014

2013

2012

2011

2010

2009

2008

2007

2006

Re: (ITS#4943) tpool.c pause vs. finish