Re: CPU scaling

25 Dec 2007


      Hi Howard,
I should be able to get a hold of an 8-way Xeon system in January sometime.
I will be able to place the order for it on the 2nd.
Cheers,
Alex
On Dec 23, 2007 8:40 PM, Howard Chu hyc@symas.com wrote:
...
Has anyone got a dual or quad socket Intel Xeon based server for testing?
I've
been testing on two AMD systems, one quad socket dual core and one dual
socket
quad core. There are a lot of different ways to tune these systems...
slapd currently uses a single listener thread and a pool of some number of
worker threads. I've found that performance improves significantly when
the
listener thread is pinned to a single core, and no other threads are
allowed
to run there. I've also found that performance improves somewhat when all
worker threads are pinned to specific cores, instead of being free to run
on
any of the remaining cores. This has made testing a bit more complicated
than
I expected.
I originally was just pinning the entire process to a set number of cores
(first 1, then 2, incrementing up to 8) to see how performance changed
with
additional cores. But due to the motherboard layout and the fact that the
I/O
bridges are directly attached to particular sockets, it makes a big
difference
exactly which cores you use.
Another item I noticed is that while we scale perfectly linearly from 1
core
to 2 cores in a socket (with a dual-core processor), as we start spreading
across multiple sockets the scaling tapers off drastically. That makes
sense
given the constraints of the Hypertransport connections between the
sockets.
On the quad-core system we scale pretty linearly from 1 to 4 cores (in one
socket) but again the improvement tapers off drastically when the 2nd
socket
is added in.
I don't have any Xeon systems to test on at the moment, but I'm curious to
see
how they do given that all CPUs should have equal access to the
northbridge.
(Of course, given that both memory and I/O traffic go over the bus, I'm
not
expecting any miracles...)
The quad-core system I'm using is a Supermicro AS-2021M-UR+B; it's based
on an
Nvidia MCP55 chipset. The gigabit ethernet is integrated in this chipset.
Using back-null we can drive this machine to over 54,000
authentications/second, at which point 100% of a core is consumed by
interrupt
processing in the ethernet driver. The driver doesn't support interrupt
coalescing, unfortunately. (By the way, that represents somewhere between
324,000pps and 432,000pps. While there's only 5 LDAP packets per
transaction,
some of the client machines choose to send separate TCP ACKs, while others
don't, which makes the packet count somewhere between 5-8 packets per
transaction. I hadn't taken those ACKs into account when I discussed these
figures before. At these packet sizes (80-140 bytes), I think the network
would be 100% saturated at around 900,000pps.)
Interestingly, while 2 cores can get over 13,000 auths/second, and 4 cores
can
get around 25,000 auths/second (using back-hdb), with all 8 cores it's
only
peaking at 29,000 auths/second. This tells me it's better to run two
separate
slapds in a mirrormode configuration on this box (4 cores per process)
than to
run a single process across all of the cores. Then I'd expect to hit
50,000
auths/second total, pretty close to the limits of the ethernet
device/driver.
--
  -- Howard Chu
  Chief Architect, Symas Corp.  http://www.symas.com
  Director, Highland Sun        http://highlandsun.com/hyc/
  Chief Architect, OpenLDAP     http://www.openldap.org/project/

2025

2024

2023

2022

2021

2020

2019

2018

2017

2016

2015

2014

2013

2012

2011

2010

2009

2008

2007

2006

Re: CPU scaling