Brad Knowles wrote:
Howard Chu wrote:
Testing on an 8-socket AMD server with Opteron 885 dual-core processors (16 cores total) and a Sun T5120 (T2 Niagara 8 cores, 64 hardware threads) has shown that our current frontend code is performing very poorly with more than 16 server threads.
You mention one particular hardware architecture here. Looking through the rest of your performance tuning slides that I know of, I'm not seeing a lot of this kind of work done on other architectures. Is this possibly an AMD or Intel limitation? Or maybe there are OS-specific issues?
These factors certainly come into play, but their influence tends to be small, e.g. 10% or so. For example, OpenLDAP on SPARC runs faster on Linux than on Solaris, but it's not a huge difference. The behaviors I'm worrying about here are much worse, e.g. throughput with 24 threads is half as fast as throughput with 16 threads.
I only ask because right now our big OpenLDAP servers are on Sun SPARC processors (UltraSPARC-IIIi?) running Solaris 9, and I'm wondering if the problems we've had in the past might be related to our particular hardware/OS choices, as compared to the ones you've been testing with.
In general, source-level optimizations - tuning algorithms, etc. - benefit all platforms. Some more than others, sure, but problems that show up on one platform are likely to be problems on all platforms. Likewise, a well-tuned installation should perform decently on any platform.
That aside, some platforms are definitely better than others. The SPARC architecture has really been lagging in instructions-per-cycle. I don't really believe that the Niagara design is going to get anywhere either. Aside from embarrassingly parallel workloads (array/vector processing, image processing, etc) it's pretty hard to write good parallel code that will scale across hundreds of threads, and you're always up against Amdahl's Law. Still, until we've investigated every possible avenue for getting decent performance out of this machine, it's too early to just write it off.