Howard Chu wrote:
Howard Chu wrote:
Howard Chu wrote:
Howard Chu wrote:
Howard Chu wrote:
Well, it doesn't look like this patch caused any harm for the default case. I'm only seeing about a 10% gain in throughput using two listener threads on a 16 core machine. Not earth-shattering, not bad.
There is a slight drop in throughput for a single listener thread compared to the pre-patched code. It's around 1%, consistent enough to not be a measurement error, but not really significant.
Eh. 10% was on a pretty lightly loaded test. On a heavy load the advantage is only 1.2%. Hardly seems worth the trouble.
At least the advantage always outweighs the above-mentioned 1% loss. I.e., cancelling both effects out, we're still ahead overall.
For anyone curious, the slamd reports from these test runs are available on http://highlandsun.com/hyc/slamd/
Comparing the results, with a single listener thread there are several points where it is obviously scaling poorly. With two listener threads, those weak spots in the single listener graphs are gone and everything runs smoothly up to the peak load.
E.g. comparing single listener
http://highlandsun.com/hyc/slamd/squeeze/singlenew/jobs/optimizing_job_20100...
vs double listener
http://highlandsun.com/hyc/slamd/squeeze/double/jobs/optimizing_job_20100808...
at 56 client threads, the double-listener slapd is 37.6% faster. Dunno why 56 clients is a magic number for the single listener, it jumps up to a more reasonable throughput at 64 client threads, and the double is only 11.7% faster.
When looking for a performance bottleneck in a system, it always helps to search in the right component.......
Tossing out the 4 old load generator machines and replacing them with two 8-core servers (and using slamd 2.0.1 instead of 2.0.0) paints quite a different picture.
http://highlandsun.com/hyc/slamd/squeeze/doublenew/jobs/optimizing_job_20100...
With the old client machines the latency went up to the 2-3msec range at peak load, with the new machines it stays under .9msec. So basically the slowdowns were due to the load generators getting overloaded, not any part of slapd getting overloaded.
The shape of the graph still looks odd with this kernel. (The column for 3 threads per client is out of whack.) But the results are so consistent I don't think there's any measuring error to blame.