Rick Jones wrote:
There are definitely interrupt coalescing settings available with tg3-driven cards, as well as bnx2 driven ones:
ftp://ftp.cup.hp.com/dist/networking/briefs/nic_latency_vs_tput.txt
Really nice work there, thanks.
Also, if the platform and the I/O card support it, and it isn't the default, MSI or MSI-X interrupts are often lower overhead than legacy INTA irq's. They can also allow - on NICs which have the support - the interrupts to be spread intelligently (well, semi-intelligently at least :) across multiple cores.
For reference, the machine is a Celestica A8440. http://www.amd.com.cn/CHCN/assets/content_type/DownloadableAssets/A8440_Data...
The ethernet controllers are on a hub attached by HyperTransport to a single processor, I don't think you can usefully distribute the interrupts to anything beyond that socket.
Although, if there is still 10% idle, that probably needs to go next :)
Heh heh. The 80/20 rule hits this with a vengeance. That's "10%" of "800%" total, which means really only about 1.2% of a CPU, which is almost totally indistinguishable from measurement error in the oprofile results. This is all intuition (guesswork) now, no more obvious hot spots left to attack. Maybe if I'm really bored over the holidays I'll spend some time on it. (Not likely.)
Reminds me of the old leapfrogging games with Excelan ethernet cards and their onboard TCP engines (15+ years ago), allowing machines of that time to hit a whopping 250KB/sec on 10Mbit ethernet. A couple years later the main CPUs got fast enough to do 500KB/sec without using the cards' "accelerators." It's been many years since I saw another NIC with onboard TCP engine after that, but they're on the market now...
Don't forget the "NFS accelerators" from ca 1990 and where they are today :)
I think I have a few in the parts bin...