After extensive benchmarking by both Howard & I, it seems the default value for the threads setting in OpenLDAP (16) is excessive for most hardware. Unless someone has plenty of CPUs (at least 4+, not counting fake HT CPUs), slapd performance is significantly improved by running at 8 threads. Recent tests done by Howard indicated that dropping the number of threads to 4 on my 2 CPU boxes further improved read performance, but I haven't had the time to do the corresponding write performance tests to see how they were impacted. In any case, the "8" value is definitely better for both read & write performance for 1 & 2 CPU servers (and possibly 4, I just haven't had one to experiment on). The only time I've found a need to increase the number of threads was when I was benchmarking the Sun T2000, which had 32 cores. Given all of this, I'd like to propose that we change
SLAP_MAX_WORKER_THREADS
in slap.h from 16 to 8 for OpenLDAP 2.4.
Thoughts?
--Quanah
-- Quanah Gibson-Mount Senior Systems Software Developer ITS/Shared Application Services Stanford University GnuPG Public Key: http://www.stanford.edu/~quanah/pgp.html
Quanah Gibson-Mount wrote:
After extensive benchmarking by both Howard & I, it seems the default value for the threads setting in OpenLDAP (16) is excessive for most hardware. Unless someone has plenty of CPUs (at least 4+, not counting fake HT CPUs), slapd performance is significantly improved by running at 8 threads. Recent tests done by Howard indicated that dropping the number of threads to 4 on my 2 CPU boxes further improved read performance, but I haven't had the time to do the corresponding write performance tests to see how they were impacted. In any case, the "8" value is definitely better for both read & write performance for 1 & 2 CPU servers (and possibly 4, I just haven't had one to experiment on). The only time I've found a need to increase the number of threads was when I was benchmarking the Sun T2000, which had 32 cores. Given all of this, I'd like to propose that we change
SLAP_MAX_WORKER_THREADS
in slap.h from 16 to 8 for OpenLDAP 2.4.
Thoughts?
I think we should leave this alone until we have read/write test results to confirm things. Also, our testing was primarily on Linux 2.6; other platforms may have more efficient thread scheduling.
--On Wednesday, April 18, 2007 1:43 PM -0700 Howard Chu hyc@symas.com wrote:
Quanah Gibson-Mount wrote:
After extensive benchmarking by both Howard & I, it seems the default value for the threads setting in OpenLDAP (16) is excessive for most hardware. Unless someone has plenty of CPUs (at least 4+, not counting fake HT CPUs), slapd performance is significantly improved by running at 8 threads. Recent tests done by Howard indicated that dropping the number of threads to 4 on my 2 CPU boxes further improved read performance, but I haven't had the time to do the corresponding write performance tests to see how they were impacted. In any case, the "8" value is definitely better for both read & write performance for 1 & 2 CPU servers (and possibly 4, I just haven't had one to experiment on). The only time I've found a need to increase the number of threads was when I was benchmarking the Sun T2000, which had 32 cores. Given all of this, I'd like to propose that we change
SLAP_MAX_WORKER_THREADS
in slap.h from 16 to 8 for OpenLDAP 2.4.
Thoughts?
I think we should leave this alone until we have read/write test results to confirm things. Also, our testing was primarily on Linux 2.6; other platforms may have more efficient thread scheduling.
I reached the same conclusions on my old Solaris V120's, with RW tests too. ;) If we get that build farm proposal, that might be a good opportunity to do some testing of small DB's on a variety of platforms, I suppose. However, I'm guessing the majority of OpenLDAP users fall into the Linux and Solaris categories.
--Quanah
-- Quanah Gibson-Mount Senior Systems Software Developer ITS/Shared Application Services Stanford University GnuPG Public Key: http://www.stanford.edu/~quanah/pgp.html
Le Mer 18 avril 2007 23:17, Quanah Gibson-Mount a écrit :
I reached the same conclusions on my old Solaris V120's, with RW tests too. ;) If we get that build farm proposal, that might be a good opportunity to do some testing of small DB's on a variety of platforms, I suppose. However, I'm guessing the majority of OpenLDAP users fall into the Linux and Solaris categories.
What sort of tests are you doing ?
Raphaël Ouazana.
--On Wednesday, April 18, 2007 11:49 PM +0200 Raphaël Ouazana-Sustowski raphael.ouazana@linagora.com wrote:
Le Mer 18 avril 2007 23:17, Quanah Gibson-Mount a écrit :
I reached the same conclusions on my old Solaris V120's, with RW tests too. ;) If we get that build farm proposal, that might be a good opportunity to do some testing of small DB's on a variety of platforms, I suppose. However, I'm guessing the majority of OpenLDAP users fall into the Linux and Solaris categories.
What sort of tests are you doing ?
A series of tests where I have mixed read/write ratios. In particular:
30% read, 70% write 50% read, 50% write 70% read, 30% write
with multiple increasing numbers of threads.
--Quanah
-- Quanah Gibson-Mount Senior Systems Software Developer ITS/Shared Application Services Stanford University GnuPG Public Key: http://www.stanford.edu/~quanah/pgp.html
Le Mer 18 avril 2007 23:50, Quanah Gibson-Mount a écrit :
A series of tests where I have mixed read/write ratios. In particular:
30% read, 70% write 50% read, 50% write 70% read, 30% write
with multiple increasing numbers of threads.
Thank you. Could you detail which sort of operations are reads and writes (modify, modrdn, add, delete, search, bind, etc.) ? Furthermore, when you say you get best results with less threads, what do you mean ? Best responsive times *and* always LDAP_SUCCESS, or just one of the two ?
Raphaël Ouazana.
--On Wednesday, April 18, 2007 11:56 PM +0200 Raphaël Ouazana-Sustowski raphael.ouazana@linagora.com wrote:
Le Mer 18 avril 2007 23:50, Quanah Gibson-Mount a écrit :
A series of tests where I have mixed read/write ratios. In particular:
30% read, 70% write 50% read, 50% write 70% read, 30% write
with multiple increasing numbers of threads.
Thank you. Could you detail which sort of operations are reads and writes (modify, modrdn, add, delete, search, bind, etc.) ?
Modifies
Furthermore, when you say you get best results with less threads, what do you mean ? Best responsive times *and* always LDAP_SUCCESS, or just one of the two ?
Both.
--Quanah
-- Quanah Gibson-Mount Senior Systems Software Developer ITS/Shared Application Services Stanford University GnuPG Public Key: http://www.stanford.edu/~quanah/pgp.html
On Wednesday 18 April 2007, Quanah Gibson-Mount wrote:
--On Wednesday, April 18, 2007 11:49 PM +0200 Raphaël Ouazana-Sustowski
raphael.ouazana@linagora.com wrote:
Le Mer 18 avril 2007 23:17, Quanah Gibson-Mount a écrit :
I reached the same conclusions on my old Solaris V120's, with RW tests too. ;) If we get that build farm proposal, that might be a good opportunity to do some testing of small DB's on a variety of platforms, I suppose. However, I'm guessing the majority of OpenLDAP users fall into the Linux and Solaris categories.
What sort of tests are you doing ?
A series of tests where I have mixed read/write ratios. In particular:
30% read, 70% write 50% read, 50% write 70% read, 30% write
with multiple increasing numbers of threads.
And, how many databases ?
I assume the recommendation is based on one database? In my deployment, I have 3 relatively large databases (~400 000, ~500 000, ~800 000), so since better performance at the reduced thread number is probably due to reduced database contention, 24 threads may be more appropriate for me?
Or, should the number of threads be configurable at the database level ?
Regards, Buchan
Buchan Milne wrote:
On Wednesday 18 April 2007, Quanah Gibson-Mount wrote:
--On Wednesday, April 18, 2007 11:49 PM +0200 Raphaël Ouazana-Sustowski
raphael.ouazana@linagora.com wrote:
Le Mer 18 avril 2007 23:17, Quanah Gibson-Mount a écrit :
I reached the same conclusions on my old Solaris V120's, with RW tests too. ;) If we get that build farm proposal, that might be a good opportunity to do some testing of small DB's on a variety of platforms, I suppose. However, I'm guessing the majority of OpenLDAP users fall into the Linux and Solaris categories.
What sort of tests are you doing ?
A series of tests where I have mixed read/write ratios. In particular:
30% read, 70% write 50% read, 50% write 70% read, 30% write
with multiple increasing numbers of threads.
And, how many databases ?
I assume the recommendation is based on one database? In my deployment, I have 3 relatively large databases (~400 000, ~500 000, ~800 000), so since better performance at the reduced thread number is probably due to reduced database contention, 24 threads may be more appropriate for me?
Or, should the number of threads be configurable at the database level ?
That wouldn't make any sense, since threads are a global resource for the process. Also, a single operation can span multiple databases (e.g. using glue/subordinate) so there's really no point to that.
It really comes down to read vs write contention. With the current entry cache in OpenLDAP 2.4, I can get a single client to read an entire 380,000 entry database (contained completely in the back-bdb entry cache) in only 0.70 seconds. On the same system (which has a dual-core processor) two clients can perform the same search in 0.71 seconds. I.e., database contention is almost nonexistent. The real problem seems to be that thread scheduling overhead is too high relative to the CPU cost of a single LDAP read operation. When you have a large number of slower operations (e.g. writes with a lot of index updates) then the thread overhead becomes proportionately smaller.
I'd like to mention a different scenario, where proxy databases need to deal with a mix of slow and fast targets. What we experienced is that concurrency can be heavily penalized by this sort of mix of targets when few threads are available, because inevitably operations affecting slow targets eat up resources that remain idle in ldap_result() while they could be used to deal with fast target related operations, while now they have to remain pending. In some cases, we had to use up to 128 threads (we even experimented with 512) with big waste of system resources.
A solution could be to redesign the proxies so that request and response are handled independently by different threads, using "client" connections that detect activities on persistent connection handlers towards the targets. Together with a customer, we quickly prototyped something like this (back-aldap, standing for "asynchronous ldap"), which is just a toy right now, but it showed some potential.
In the meanwhile, I'd like slapd to maintain as much efficiency as possible when running with lots of threads.
p.
Ing. Pierangelo Masarati OpenLDAP Core Team
SysNet s.r.l. via Dossi, 8 - 27100 Pavia - ITALIA http://www.sys-net.it --------------------------------------- Office: +39 02 23998309 Mobile: +39 333 4963172 Email: pierangelo.masarati@sys-net.it ---------------------------------------
--On Monday, April 23, 2007 7:31 PM +0200 Pierangelo Masarati ando@sys-net.it wrote:
I'd like to mention a different scenario, where proxy databases need to deal with a mix of slow and fast targets. What we experienced is that concurrency can be heavily penalized by this sort of mix of targets when few threads are available, because inevitably operations affecting slow targets eat up resources that remain idle in ldap_result() while they could be used to deal with fast target related operations, while now they have to remain pending. In some cases, we had to use up to 128 threads (we even experimented with 512) with big waste of system resources.
A solution could be to redesign the proxies so that request and response are handled independently by different threads, using "client" connections that detect activities on persistent connection handlers towards the targets. Together with a customer, we quickly prototyped something like this (back-aldap, standing for "asynchronous ldap"), which is just a toy right now, but it showed some potential.
In the meanwhile, I'd like slapd to maintain as much efficiency as possible when running with lots of threads.
Right, there are cases where more threads are necessary. But I think the default value of 16 is too high for the majority of cases. This is exactly the sort of case that should be well documented. I'm somewhat working on in my mind an FAQ entry on OpenLDAP tuning putting together the various bits I've gleaned over the years.
--Quanah
-- Quanah Gibson-Mount Senior Systems Software Developer ITS/Shared Application Services Stanford University GnuPG Public Key: http://www.stanford.edu/~quanah/pgp.html
Pierangelo Masarati wrote:
I'd like to mention a different scenario, where proxy databases need to deal with a mix of slow and fast targets. What we experienced is that concurrency can be heavily penalized by this sort of mix of targets when few threads are available, because inevitably operations affecting slow targets eat up resources that remain idle in ldap_result() while they could be used to deal with fast target related operations, while now they have to remain pending. In some cases, we had to use up to 128 threads (we even experimented with 512) with big waste of system resources.
A solution could be to redesign the proxies so that request and response are handled independently by different threads, using "client" connections that detect activities on persistent connection handlers towards the targets. Together with a customer, we quickly prototyped something like this (back-aldap, standing for "asynchronous ldap"), which is just a toy right now, but it showed some potential.
In the meanwhile, I'd like slapd to maintain as much efficiency as possible when running with lots of threads.
One of the things I've been considering is to extend the thread pool manager to keep track of how often a particular thread cycles through the pool. This would allow us to differentiate long-running operation threads from the cheaper/faster operations. Then we could automatically spawn a new thread to compensate for long-running operations monopolizing an existing thread. (The long-running thread can just exit when it finally completes.) The net effect would be that we would have a target number of configured threads X, and up to 2X if there are X slow operations running at once.
Pierangelo Masarati wrote:
I'd like to mention a different scenario, where proxy databases need to deal with a mix of slow and fast targets. What we experienced is that concurrency can be heavily penalized by this sort of mix of targets when few threads are available, because inevitably operations affecting slow targets eat up resources that remain idle in ldap_result() while they could be used to deal with fast target related operations, while now they have to remain pending. In some cases, we had to use up to 128 threads (we even experimented with 512) with big waste of system resources.
A solution could be to redesign the proxies so that request and response are handled independently by different threads, using "client" connections that detect activities on persistent connection handlers towards the targets. Together with a customer, we quickly prototyped something like this (back-aldap, standing for "asynchronous ldap"), which is just a toy right now, but it showed some potential.
I remember thinking that we need to do this; the syncrepl consumer certainly works better with this approach.