Hallvard B Furuseth wrote:
Howard Chu writes:
> The -O2 build is faster from about 4 to 24 client threads. From 28 on
> up, the nonoptimized code is faster at every load level. I was
> originally using gcc 4.1.2 but I'm seeing the same result now using
> gcc 4.2.2. Also, slapd is only configured with 8 worker threads in all
> of these tests. Strange that whatever optimizations the compiler has
> generated speeds things up for lighter load, but works against it
> under heavier load.
Not really. Lots of possible optimizations are trade-offs between
unguessable guesstimates - cache usage, branch prediction, whatever.
Maybe some small piece of code got unluckily optimized and dominates
the rest under heavy load. With a bit of luck, the difference between
light and heavy runs will stand out with some sort of profiling (gprof,
cachegrind, helgrind, whatever).
The difference is small enough that I'm not really concerned, just curious.
Compiling with -Os to optimize space yielded about the same result as -O2.
Interestingly, compiling with -O3 got a peak rate of around 39K/sec, but
performance maxed out much more slowly. It took till 276 client connections
before throughput finally stopped increasing. That's good news for servers
that regularly have large numbers of active clients.
-- Howard Chu
Chief Architect, Symas Corp. http://www.symas.com
Director, Highland Sun http://highlandsun.com/hyc/
Chief Architect, OpenLDAP http://www.openldap.org/project/