i just looked at this and i have a sneaking suspicion that you may be
running into the same problem that i encountered when accidentally
opening 10 LMDBs 20 times by forking 30 processes *after* opening the
10 LMDBs... (and also forgetting to close them in the parent).
what i found was that when i reduced the number of LMDBs to 3 or
below, the loadavg on the multi-core system was absolutely fine [ok,
it was around 3 to 4 but that's ok]
adding one more LMDB (remember that's 31 extra file handles opened to
a shm-mmap'd file) increased the loadavg to 6 or 7. by the time that
was up to 10 the loadavg had completely unreasonably jumped to over
30. i could log in over ssh - that was not a problem. editing a file
was ok (opening it) but trying to create new files resulted in
applications (such as vim) was blocked so badly that i often could not
even press ctrl-z in order to background the task, and had to kill the
entire ssh session.
in each test run the amount of work being done was actually relatively small.
basically i suspect a severe bug in the linux kernel which these
extreme circumstances (32 or 16 processes accessing the same mmap'd
file for example) have never been encountered, so the bug is simply...
can i make the suggestion that, whilst i am aware that it is generally
not recommended for production environments to run more processes than
there are cores, you try running 128, 256 and even 512 processes all
hitting that 64-core system, and monitor its I/O usage (iostats) and
loadavg whilst doing so?
the hypothesis to test is that the performance, which should scale
reasonably linearly downwards as a ratio of the number of processes to
the number of cores, instead drops like a lead balloon.