Revisiting an old thread...
http://www.openldap.org/lists/openldap-devel/200504/msg00066.html
I'm definitely seeing our current listener running out of steam on servers with more than 12 or so cores, and some work in this direction will definitely help. First it would be a good idea to classify all of the tasks that the current listener manages, before deciding how to divide them among multiple threads.
The listener is responsible for many events right now: signal/shutdown processing idle timeout checks write timeout checks runqueue scheduling listener socket events threadpool pauses client socket read events client socket write events
Splitting the client socket handling across multiple threads will bring the greatest improvement in scalability. Just need to check our thinking and make sure the remaining division of labor still makes sense.
There are two cases that annoy me in our current design - why don't we just dedicate a thread to each listener socket, and let it block on accept() ? That would eliminate a bit of churn in the current select() workload.
Likewise, why don't we just let writer threads block in write(), instead of having them ask the listener to listen for writability on their socket? Or, if we're using non-blocking sockets, why don't we let the writer threads block in their own select call, instead of relying on the central thread to do the select and re-dispatch?
The first, obvious answer is this: when threads are blocked in system calls like accept(), we can't simply wake them up again for shutdown events or other situations. I believe the obvious fix here is to use select() in each thread, waiting for both the target fd and the wake_sds fd which is written to whenever a signal is caught. Off the top of my head I'm not sure, when several threads are selecting on the same fd, if they all receive a Readable event or if only one of them will. Anyone know?
I don't think the remaining tasks really involve much overhead. So ideally, we can handle the idle/write timeout and runqueue scheduling in a single thread, which can also be responsible for the main signal/shutdown processing.
The listener sockets can each have their own dedicated thread, selecting on their listener socket and the signal fd.
A small number of threads can handle the bulk of the client socket events. We would continue to use a power-of-two descriptor table, and a power-of-two set of threads here. The division of labor would simply be thread# = FD % number of threads (There's no need to split the current connection table into multiple arrays, we just divvy things up so that a given thread only accesses its own slots in the array.)
Comments?