Matthew Moskewicz wrote:
warnings: new to list, first post, lmdb noob.
i'm a caffe user: https://github.com/BVLC/caffe
in one use case, caffe sequentially streams though >100GB lmdbs at a rate of ~30MB/s in blocks of about 40MB. however, if multiple caffe processes are reading the same lmdb (opened with MDB_RDONLY), read performance becomes limiting (i.e. the processes become IO bound), even though the disk has sufficient read bandwidth (say ~180MB/s). some of the relevant caffe lmdb code is here:
https://github.com/BVLC/caffe/blob/master/src/caffe/util/db.cpp
however, if i *both*
- run blockdev --setra 65536 --setfra 65536 /dev/sdwhatever
- modify lmdb to call posix_madvise(env->me_map, env->me_mapsize,
POSIX_MADV_SEQUENTIAL);
then i can get >1 reader to run without being IO limited.
This is quite timing-dependent - if you start your multiple readers at exactly the same time and they run at exactly the same speed, then they will all be using the same cached pages and all of the readers can run at the full bandwidth of the disk. If they're staggered or not running in lockstep, then you'll only get partial performance.
for (2), see https://github.com/moskewcz/scratch/tree/lmdb_seq_read_opt
similarly, using a sequential read microbenchmark designed to model the caffe reads from here: https://github.com/moskewcz/boda/blob/master/src/lmdbif.cc
if i run one reader, i get 180MB/s bandwidth. with two readers, but neither (1) nor (2) above, each gets ~30MB/s bandwidth. with (1) and (2) enabled, and two readers, each gets ~90MB/s bandwidth.
The other point to note is that sequential reads in LMDB won't remain truly sequential (as seen by the storage device) after a few rounds of inserts/deletes/updates. Once you get any element of seek/random I/O in here your madvise will be useless.
any advice?
mwm
PS: backstory (skippable): caffe originally used LevelDB to get better read performance for sequentially loading sets of ~1M 227x227x3 raw images (~200GB data). typically processing time is ~2 hours for this data set size, yielding a read BW need of 30MB/s or so. it's not really clear if/why LevelDB was uses aside from the fact that the caffe author was a google intern at the time he wrote it, but anecdotally i think the claim is that reading the raw .jpgs had perf. issues, although it's unclear exactly what or why. i guess it was the usual story about not getting sequential reads without using LevelDB. they switched to lmdb a while back.