On Mon, Oct 20, 2014 at 1:53 PM, Howard Chu hyc@symas.com wrote:
then it would be possible to make a direct comparison (against the figures you just sent), against the e.g. 32-threads case. 32 readers, 2 writers. 32 readers, 4 writers. 32 readers, 8 writers and so on. keeping the number of threads (write plus read) to below or equal the total number of cores avoids any unnecessary context-switching
We can do that by running two instances of the benchmark program concurrently; one doing a read-only job with a fixed number of threads (32) and one doing a write-only job with the increasing number of threads.
ohh, ok - great. saves a job doing some programming at least.
the hypothesis being tested is that the writers performance overall remains the same, as only one may perform writes at a time.
i know it sounds silly to do that: it sounds so obvious that yeah it really should not make any difference given that no matter how many writers there are they will always do absolutely nothing (except one of them), and the context switching when one finishes should also be negligeable, but i know there's something wrong and i'd like to help find out what it is.
My experience from benchmarking OpenLDAP over the years is that mutexes scale only up to a point. When you have threads grabbing the same mutex from across socket boundaries, things go into the toilet. There's no fix for this; that's the nature of inter-socket communication.
argh. ok. so... actually.... accidentally, the design where i used a single LMDB (one env) shared amongst (20 to 30) processes using db_open to create (10 or so) databases would mitigate against that... taking a quick look at mdb.c the mutex lock is done on the env not on the database...
sooo compared to the previous design there would only be a 20/30-to-1 mutex contention whereas previously there were *10 sets* of 20 or 30 to 1 mutexes all competing... and if mutexes use sockets underneath that would explain why the inter-process communication (which also used sockets) was so dreadful.
huh, how about that.
do you happen to have access to a straight 8-core SMP system, or is it relatively easy to turn off the NUMA architecture?
l.