Raphaël Ouazana-Sustowski wrote:
Hi,
Le Jeu 11 octobre 2007 12:12, Howard Chu a écrit :
But the other nice point about this result is that while the number of CPU cores was increased by a factor of 2, the overall auth rates increased by a factor of 2.7. So we can see that OpenLDAP 2.4 is at least 35% faster than OpenLDAP 2.3. It also shows that the lightweight dispatcher is still scaling well with available CPU power.
What version of BerkeleyDB do you use? 4.2.52+patches, or one of the first 4.6 ? Do you use glibc malloc?
OpenLDAP 2.4 supports all BDB 4.6 releases, and we used 4.6.19 in this test. Yes, glibc malloc was used here. In the back-null case I don't think it would make any difference, but it may be interesting to repeat this with tcmalloc anyway.
Another point is the same thing I've harped on before - SLAMD is too inefficient on clients. We measured the 38K/sec rate using the slapd-auth C client that I just added to CVS HEAD. Using the SLAMD java client we could only reach 29K/sec, even with 12 client machines and 10 threads per client. Someone's going to have to sit down and write a C client that speaks the SLAMD server protocol and just fires off compiled C clients...
Do you know where the SLAMD server protocol is described? I (or someone else, not very much time currently) could implement it in Charge ( http://loadtesting.sourceforge.net/index.php?lang=en ).
I have not seen a document describing it, but I've looked at parts of it. Since SLAMD is open source, you can just download it and read the code. Like LDAP, SLAMD's protocol uses ASN.1.
I don't want to knock it too much - the SLAMD framework is really a great piece of work. Neil Wilson did a terrific job with it, the code is really easy to read and the job management and reporting features are really well thought out and very useful. From my perspective, the really hard parts of setting up a good test framework have been done admirably by Neil. I think the load generators are bad, but they're really just a small fraction of the overall SLAMD package.
Anyone can write load generators; we have half a dozen in our source tree that are good starting points already. The trick is gathering the results and presenting them in a meaningful fashion, which is SLAMD's strength. I think it makes more sense to enhance SLAMD here than to write Yet Another Complete testing suite.
The SLAMD code also clearly illustrates the trade-off you make by coding in Java. You can install a SLAMD load generator client anywhere that has a Java runtime environment, and so you can easily write a new Job Class, install it on the SLAMD server, and have your clients pick it up from the server and immediately start executing it. Very convenient, no up-front time investment needed to set up compilers and such on all of the client machines. It doesn't matter what kind of client machines you use, they'll all run the same bytecode.
But, you pay for the convenience because your code is effectively being compiled and recompiled every second that it runs. So for the sake of having easily-added jobs, instead of having to setup a compiler once, on a small number of machines, you have to install the clients on a large number of machines. And instead of having your runtime consist purely of your code's execution time, you mix the compile time in over and over again. That's an OK tradeoff for user-interface code, but not when measuring timing is the sole raison'd'etre for the code.
A C-based client would be pretty much as portable - probably any system that has a JVM can also run gcc. The difference is, once you've compiled the code, you stop paying for that time, and when the code runs, the things you're measuring are exactly what you wanted to measure, with no other overhead. We could easily adapt SLAMD to transfer C source files to the clients and have the clients compile them in advance when distributing newly written Jobs.
Case in point - a single thread of the C slapd-auth client I wrote can generate 10 times as many transactions per second as a single thread of the original Java client. A load that took 12 Java clients with 10 threads each still couldn't match the load generated by 4 instances of the C client.