Howard Chu wrote:
Winsock's select() implementation is pretty non-optimal. Unfortunately using the Microsoft-recommended asynchronous functions would implicitly set all the sockets to non-blocking, and stuff like OpenSSL doesn't behave well with non-blocking sockets.
For reference, the peak throughput with back-null on the previous code was only 7,800 auths/sec (with 8 client threads). With this patch it's 11,140 auths/sec. In both cases the throughput declines as more client threads are used. (Compare to 35,553 auths/sec for the same machine running Linux, and no drop in throughput all the way up to hundreds/thousands of connections.)
Peak throughput on the new code with back-hdb is 7,972 auths/sec (with 12 client threads).
Ah, read the preliminary result, oops. Final rate was 8,030 auths/sec.
With the previous code it was 6,252 auths/sec (with 8 client threads). (The 7,972 figure is also after setting processor affinities for the threads, forcing the listener to use core #0 and forcing the worker threads to use cores #1-7. Without that tweak, the peak is only 7,717/sec.)
I forgot to note that this is using an experimental build of gcc 4.3.0 (because earlier versions don't really support the Win64 ABI) and all optimization is turned off (due to some nasty bugs that make gcc 4.3.0's optimizer unusable). We're tracking the bug on the mingw-w64 mailing list; hopefully we'll have a fix soon.
This is also using BerkeleyDB 4.6.21. The 1M entry DB loads in about 8 minutes here (vs 3 minutes on Linux) and I doubt that the optimizer is going to make up a significant chunk of that difference. I.e., there are multiple aspects of this OS (Windows Server 2003 SP2 Enterprise Edition x86_64) that are much slower than Linux - not just the connection handling or disk I/O, but also mutexes, thread scheduling, etc.
E.g., this search command against the Linux OpenLDAP build time ./ldapsearch -x -H ldap://sihu -D dc=example,dc=com -w "secret" -LLL -b ou=people,dc=example,dc=com -E pr=1000/noprompt 1.1 > dn1
real 0m17.766s user 0m5.337s sys 0m7.831s
Got this result against Windows OpenLDAP time ./ldapsearch -x -H ldap://sihu:9000 -D dc=example,dc=com -w "secret" -LLL -b ou=people,dc=example,dc=com -E pr=1000/noprompt 1.1 > dn1
real 0m36.553s user 0m5.612s sys 0m4.541s
This is with the DB fully cached, so there's no disk I/O, and the number of network roundtrips is identical in both cases. (I guess I should measure that again on Linux without the optimizer, to make the numbers more fair.)