David Boreham wrote:
With this patch, a slapd that occupies 6.8GB on a system with 8GB of RAM can run continuously without swapping, delivering a sustained 11,500 authentications per second. Without the patch, swapping starts when the process hits the 4.5GB mark (because over 3GB of buffer cache is in use), and performance drops to only *hundreds* of authentications per second.
This is interesting. Did you test performance under other workloads ? Reason I ask is that every time I've tried O_DIRECT in the past performance suffered (significantly) in the case where I/O is being done (I suspect due to reduced concurrency because the application must block in cases where it wouldn't have when using OS buffering). Other database productst that I keep track of (e.g. Postgresql) report similar findings.
Testing with swappiness=0 actually did turn out faster, by a tiny margin. Peak throughput was 11609 auths/second @ 160 client threads with swappiness=0, vs 11567/sec @ 140 client threads with O_DIRECT. Peak process size was also slightly smaller without O_DIRECT.
I think the difference is so small because the caches were already at a 99% hit rate; very few requests would actually need to do I/O. But in those cases where the data wasn't in the slapd or the BDB cache, it had a chance of being in the fs buffer cache, thus the higher throughput without O_DIRECT.
At this point I'm going to forget about the O_DIRECT patch.