I presume we can close this ITS now.
I've been running some tests on a quad-processor AMD system, and seeing a lot of mutex contention in the frontend. It looks like the current threadpool and connection manager architecture are a bad fit for a NUMA system like this. I'm planning to add support for multiple thread pools (one per CPU would be the idea) and multiple listener threads to slapd.
As a first step, after 2.4.6 is released, I'm going to unifdef the SLAPD_LIGHTWEIGHT_DISPATCHER symbol and delete the old dispatcher code.
Based on some experimental changes I've already made, I see a difference between 25K auths/sec with the current code, vs 39K auths/sec using separate thread pools.