Howard Chu wrote:
When looking for a performance bottleneck in a system, it always helps to search in the right component.......
Tossing out the 4 old load generator machines and replacing them with two 8-core servers (and using slamd 2.0.1 instead of 2.0.0) paints quite a different picture.
http://highlandsun.com/hyc/slamd/squeeze/doublenew/jobs/optimizing_job_20100...
With the old client machines the latency went up to the 2-3msec range at peak load, with the new machines it stays under .9msec. So basically the slowdowns were due to the load generators getting overloaded, not any part of slapd getting overloaded.
The shape of the graph still looks odd with this kernel. (The column for 3 threads per client is out of whack.) But the results are so consistent I don't think there's any measuring error to blame.
Also added results using BDB 4.8.30 (previous used 4.7.25) and also using a 2.6.35 kernel.
BDB 4.8 vs 4.7 seems to be worth about a 5% gain on its own. The 2.6.35 kernel gives a slight boost as well, with search rates spiking over 67600/second, and idle CPU down to 6%. At that point, slapd is consuming around 90% of the CPU. Network interrupts also consume about 6% total, or ~75% of one core.
Hm, I may have forgotten to mention these tests are being run on an HP Proliant DL585 G5 with 4 Opteron 8354 (2.2Ghz quad core) processors. The box has 64GB of RAM; the DB has 5 million entries and slapd is using around 25-26GB of memory. (Around 4KB per entry, plus ~4GB of BDB shared memory cache.)