Rick Jones wrote:
There are definitely interrupt coalescing settings available with tg3-driven cards, as well as bnx2 driven ones:
ftp://ftp.cup.hp.com/dist/networking/briefs/nic_latency_vs_tput.txt
Yep, that helped. Raising rx-usecs from default 20 to 1000, and rx-frames from default 5 to 100, I'm getting 43k auths/sec with back-null (in 4 separate thread pools) and the core fielding the interrupts is only about 80% busy now instead of 100%. I'm afraid my load generators may be maxed out now, because I can't seem to drive up the load on the server any higher even though there's more idle CPU.
The current code in HEAD (with only 1 thread pool) is reaching 36k auths/sec with back-null, so it's actually not far off from my experimental peak rate. Considering that HEAD was at 25k/sec last week (and now in 2.4.6) that's pretty decent.
With back-bdb and 1 million users I'm getting 26.1k/sec with plaintext passwords (up from 19.3k/sec last week). With {SSHA} passwords that drops to 25.7k/sec (~1.5% difference).
I have to put this tinkering on hold for a bit, to run some authrate tests against ActiveDirectory on this machine (using W2K3sp2 X64). Later on we'll do a W2K3 OpenLDAP build for comparison as well. Should be entertaining...