I've been spending some time perf testing OL 2.4 in relation to OL 2.3. Unfortunately, RE24 is noticeably slower than 2.3 was. Results of simple auth testing with slamd show:
OL 2.3: 21,745 auths/second
OL 2.4: 15,733 auths/second
So OL 2.4 is 6,000 auths/second (aka 12,000 searches/second) slower than 2.3. I.e., 27% slower.
Howard committed a patch that slightly helps some situations, and Hallvard has a rewrite of part of the lber library that I've been testing that he'll commit soon. That helps somewhat:
OL 2.4 with howard and hallvard's patches: 17,086 auths/second.
That still leaves us over 4,500 auths/second (or 9000 searches/second) slower than RE2.3. I.e., 21.5% slower. Which is quite a substantial gap.
--Quanah
--
Quanah Gibson-Mount Principal Software Engineer Zimbra, Inc -------------------- Zimbra :: the leader in open source messaging and collaboration
Quanah Gibson-Mount wrote:
I've been spending some time perf testing OL 2.4 in relation to OL 2.3. Unfortunately, RE24 is noticeably slower than 2.3 was. Results of simple auth testing with slamd show:
OL 2.3: 21,745 auths/second
OL 2.4: 15,733 auths/second
So OL 2.4 is 6,000 auths/second (aka 12,000 searches/second) slower than 2.3. I.e., 27% slower.
Howard committed a patch that slightly helps some situations, and Hallvard has a rewrite of part of the lber library that I've been testing that he'll commit soon. That helps somewhat:
OL 2.4 with howard and hallvard's patches: 17,086 auths/second.
That still leaves us over 4,500 auths/second (or 9000 searches/second) slower than RE2.3. I.e., 21.5% slower. Which is quite a substantial gap.
Any info about the infra you are running this test ?
Thanks !
--Quanah
--
Quanah Gibson-Mount Principal Software Engineer Zimbra, Inc
Zimbra :: the leader in open source messaging and collaboration
--On Tuesday, July 28, 2009 7:44 PM +0200 Emmanuel Lecharny elecharny@apache.org wrote:
That still leaves us over 4,500 auths/second (or 9000 searches/second) slower than RE2.3. I.e., 21.5% slower. Which is quite a substantial gap.
Any info about the infra you are running this test ?
Thanks !
8 core system.
model name : Intel(R) Xeon(R) CPU L5335 @ 2.00GHz
with 32 GB of RAM
--Quanah
--
Quanah Gibson-Mount Principal Software Engineer Zimbra, Inc -------------------- Zimbra :: the leader in open source messaging and collaboration
Quanah Gibson-Mount writes:
OL 2.4 with howard and hallvard's patches: 17,086 auths/second.
I'm tempted to backport the patches to 2.3 so we can compare with what 2.3 could have been:-)
That still leaves us over 4,500 auths/second (or 9000 searches/second) slower than RE2.3. I.e., 21.5% slower. Which is quite a substantial gap.
Maybe --disable-debug is a place to fetch some speed. If it makes a enough difference from loglevel 0, it might be an idea to disable _some_ loglevels and some asserts in the default build. After we've done something about that broken log system of course - IIRC today one needs to turn on massive logging to get certain error messages.
--On Tuesday, July 28, 2009 11:23 PM +0200 Hallvard B Furuseth h.b.furuseth@usit.uio.no wrote:
Quanah Gibson-Mount writes:
OL 2.4 with howard and hallvard's patches: 17,086 auths/second.
I'm tempted to backport the patches to 2.3 so we can compare with what 2.3 could have been:-)
That still leaves us over 4,500 auths/second (or 9000 searches/second) slower than RE2.3. I.e., 21.5% slower. Which is quite a substantial gap.
Maybe --disable-debug is a place to fetch some speed. If it makes a enough difference from loglevel 0, it might be an idea to disable _some_ loglevels and some asserts in the default build. After we've done something about that broken log system of course - IIRC today one needs to turn on massive logging to get certain error messages.
I'm running with loglevel none, which logs almost nothing. I do that so syslog doesn't cause a perf hit. I don't think it specifically will make much difference in the auth rate I'm getting.
--Quanah
--
Quanah Gibson-Mount Principal Software Engineer Zimbra, Inc -------------------- Zimbra :: the leader in open source messaging and collaboration
Quanah Gibson-Mount writes:
h.b.furuseth@usit.uio.no wrote:
Maybe --disable-debug is a place to fetch some speed. If it makes a enough difference from loglevel 0, it might be an idea to disable _some_ loglevels and some asserts in the default build. (...)
I'm running with loglevel none, which logs almost nothing. I do that so syslog doesn't cause a perf hit.
Right, but without --disable-debug that still executes the Debug() and assert() code which decides not to log anything and not to abort.
Hallvard B Furuseth wrote:
Quanah Gibson-Mount writes:
OL 2.4 with howard and hallvard's patches: 17,086 auths/second.
I'm tempted to backport the patches to 2.3 so we can compare with what 2.3 could have been:-)
Yeah, that thought crossed my mind too. These are both limited-scope patches so the backport should be pretty easy.
But right now I'd like to focus on the differences in the 2.3 and 2.4 profile results, so we can see where we lost the performance...
That still leaves us over 4,500 auths/second (or 9000 searches/second) slower than RE2.3. I.e., 21.5% slower. Which is quite a substantial gap.
Maybe --disable-debug is a place to fetch some speed. If it makes a enough difference from loglevel 0, it might be an idea to disable _some_ loglevels and some asserts in the default build. After we've done something about that broken log system of course - IIRC today one needs to turn on massive logging to get certain error messages.
Most error messages of any importance are logged at level -1, so they're always displayed if any level of logging was enabled. I don't think this particular area is really broken. Certainly disabling debug support will give a sizeable performance boost, but it will also make diagnostics impossible when anything goes wrong.
We can always tell performance-sensitive users to build and install the whole thing twice, once without debug, and run that unless they need to reproduce a problem... :P
Other things that might gain some speed:
Unwrap the pthread wrapper to shave some time spent in critical sections: http://folk.uio.no/hbf/OpenLDAP/unwrap-pthreads.txt
For that matter, unwrap frequently used small wrappers like ber_memalloc.
A Makefile target which builds with gcc -fprofile-generate, runs some tests, then does make clean and rebuilds with -fprofile-use. Except it didn't work for me, ld complained about __gcov_merge_add. Oh well.
Use the C99 'restrict' keyword when available, e.g. on struct berval* parameters. Functions that e.g. receive a struct berval*, uses it frequently, and modifies bv->bv_val[...], cannot be optimized well: Sometimes the members _could_ have been kept in registers, but the compiler cannot know that because as far as it knows bv_val may be pointing back into bv, so it must re-read bv after changing bv_val[...]. (Of course another way would be to rewrite such code to extract the bv members to a local variable, but that'd be more work per function.)
Howard Chu writes:
Hallvard B Furuseth wrote:
(...) it might be an idea to disable _some_ loglevels and some asserts in the default build. After we've done something about that broken log system of course - IIRC today one needs to turn on massive logging to get certain error messages.
Most error messages of any importance are logged at level -1, so they're always displayed if any level of logging was enabled. I don't think this particular area is really broken.
I'm fairly sure I've needed once in a while to turn on debug output to see why som slap tool failed, at least. Haven't paid attention to when it happens, I'll try to remember to report it next time it happens.
Certainly disabling debug support will give a sizeable performance boost, but it will also make diagnostics impossible when anything goes wrong.
I dunno. Personally I'd be happy to lose TRACE and maybe ARGS from the default, but then I haven't done anything like your amount of debugging. I find myself asking users for loglevel STATS output more often than TRACE - since people think it's a good idea to turn on full logging and then only show the last lines of the log, cut off after the relevant STATS log which might have been all that was needed.
--On July 28, 2009 10:36:08 AM -0700 Quanah Gibson-Mount quanah@zimbra.com wrote:
I've been spending some time perf testing OL 2.4 in relation to OL 2.3. Unfortunately, RE24 is noticeably slower than 2.3 was. Results of simple auth testing with slamd show:
OL 2.3: 21,745 auths/second
OL 2.4: 15,733 auths/second
So OL 2.4 is 6,000 auths/second (aka 12,000 searches/second) slower than 2.3. I.e., 27% slower.
Howard committed a patch that slightly helps some situations, and Hallvard has a rewrite of part of the lber library that I've been testing that he'll commit soon. That helps somewhat:
OL 2.4 with howard and hallvard's patches: 17,086 auths/second.
That still leaves us over 4,500 auths/second (or 9000 searches/second) slower than RE2.3. I.e., 21.5% slower. Which is quite a substantial gap.
Here are the numbers with --enable-debug=no.
OL 2.3: 22,356 auths/second
OL 2.4: 17,396 auths/second
So for 2.3, this is an improvement of 611 auths/second. For 2.4, this is an improvement of 1,663 auths/second. Which I find rather significant. ;)
--Quanah
--
Quanah Gibson-Mount Principal Software Engineer Zimbra, Inc -------------------- Zimbra :: the leader in open source messaging and collaboration
Quanah Gibson-Mount wrote:
--On July 28, 2009 10:36:08 AM -0700 Quanah Gibson-Mount quanah@zimbra.com wrote:
I've been spending some time perf testing OL 2.4 in relation to OL 2.3. Unfortunately, RE24 is noticeably slower than 2.3 was. Results of simple auth testing with slamd show:
OL 2.3: 21,745 auths/second
OL 2.4: 15,733 auths/second
So OL 2.4 is 6,000 auths/second (aka 12,000 searches/second) slower than 2.3. I.e., 27% slower.
Howard committed a patch that slightly helps some situations, and Hallvard has a rewrite of part of the lber library that I've been testing that he'll commit soon. That helps somewhat:
OL 2.4 with howard and hallvard's patches: 17,086 auths/second.
That still leaves us over 4,500 auths/second (or 9000 searches/second) slower than RE2.3. I.e., 21.5% slower. Which is quite a substantial gap.
Here are the numbers with --enable-debug=no.
OL 2.3: 22,356 auths/second
OL 2.4: 17,396 auths/second
So for 2.3, this is an improvement of 611 auths/second. For 2.4, this is an improvement of 1,663 auths/second. Which I find rather significant. ;)
OK, that lends some weight to the idea that we have too many assert()s in the 2.4 code...
On Sat, 1 Aug 2009, Howard Chu wrote:
OK, that lends some weight to the idea that we have too many assert()s in the 2.4 code...
Ehhhhh. I understand this from the performance standpoint, but from the usability standpoint, I really like to die fast when something's wrong. Even in full production and under load.
Configurable for "I want to benchmark above all else," perhaps, but I would be wary of the removal route.
Aaron Richton writes:
On Sat, 1 Aug 2009, Howard Chu wrote:
OK, that lends some weight to the idea that we have too many assert()s in the 2.4 code...
Or too many Debug()s. To test which, you could make include/ac/assert.h contain just the line "#include <assert.h>", then configure with one but not both of --disable-debug and CPPFLAGS=-DNDEBUG.
Ehhhhh. I understand this from the performance standpoint, but from the usability standpoint, I really like to die fast when something's wrong. Even in full production and under load.
Well, there's assert(we kept track of what's going on) and then there's assert(we obeyed our own calling conventions to this static function). And assert(the pointer we are about to follow is not NULL), which helps explain why a crash happens but isn't usually needed to cause the crash.
Configurable for "I want to benchmark above all else," perhaps, but I would be wary of the removal route.
I'd like something like an assume() macro and an assert() macro, where assume() could be #defined as assert or noop or, if the compiler supports it, a compiler hint:-)
--On August 1, 2009 2:08:18 PM -0700 Quanah Gibson-Mount quanah@zimbra.com wrote:
--On July 28, 2009 10:36:08 AM -0700 Quanah Gibson-Mount quanah@zimbra.com wrote:
I've been spending some time perf testing OL 2.4 in relation to OL 2.3. Unfortunately, RE24 is noticeably slower than 2.3 was. Results of simple auth testing with slamd show:
OL 2.3: 21,745 auths/second
OL 2.4: 15,733 auths/second
Using BDB 4.8 instead of BDB 4.5 with OL2.4 (which 2.3 does not support), the base rate increases to 16,300 auths/second. I'll see what happens with the two perf patches added into the mix with 4.8. Perhaps we can get fairly close to 2.3 that way. :P
--Quanah
--
Quanah Gibson-Mount Principal Software Engineer Zimbra, Inc -------------------- Zimbra :: the leader in open source messaging and collaboration