Hallvard B Furuseth wrote:
Howard Chu writes:
(...)
BTW, is the problem that each operation locks pool->ltp_mutex 2-3 times, or is it the amount of time it is kept locked? Most tests which pool_<submit/wrapper>() do with ltp_mutex locked, could be precomputed.
Both I think.
Then I'll at least reduce pool_wrapper a bit. The for(;;) can become: for (;;) { task = LDAP_STAILQ_FIRST(pool->ltp_pending_listptr); if (task == NULL) { if (pool->ltp_close_thread) break; /* !ltp_pause&& (FINISHING or too many threads) */ ldap_pvt_thread_cond_wait(&pool->ltp_cond,&pool->ltp_mutex); continue; } <rest of loop untouched>; } ltp_pending_listptr == (ltp_pause ?&(empty list) :<p_pending_list). Removed state STOPPING. We can use FINISHING and flush pending_list.
Reducing _submit() gets a bit uglier. The if (...RUNNING etc ...) test and the "create new thread?" test can both be reduced to simple compares, and the ltp_pause test can move into the branch for the latter. I think that'll make the file harder to rearrange later though, so maybe it should wait.
Reducing the size of the critical section is always a good idea. But right, if it's going to just make things more complicated, we can hold off for now.
I haven't set up Dtrace on the T5120 yet to get better info, but oprofile on Linux shows that pthread_mutex_lock is taking too much CPU time (vying with the ethernet driver for #1 consumer). And realistically, a single shared resource like this is just a really bad idea.
True enough. Still, slapd has a lot of mutexes. Should perhaps check if this one stands out before rearraning scheduling around it.
Using back-null makes this fairly clear. There are no globally shared mutexes in the connection manager at all, so this is the only remaining culprit. Everything else in the processing chain is per-connection, which should be zero contention. Granted, making back-null run fast may do little for back-bdb or other backends, but at the moment it's clear that the frontend is a problem.