Hi, all
I am a caffe user. In my use case, I am reading from a ~300GB lmdb sequentially, reading one element and never accessing again until I read every other element in the db and loops around again. It seems that lmdb will page cache every element. This becomes a problem as the dataset is read in fairly fast and it takes 1 hour before nearly 60% of the RAM is devoted to my lmdb page caches. Then it runs out of unmapped memory to use, so it starts kicking out the page frames of processes of other users, many of which have not been accessed in the last hour. So the system will prefer to kick out those page frames instead of page frames mapped at the beginning of the run. This behavior is entirely understandable but it causes extremely severe thrashing and unresponsive system, as those page frames of other user's comes into use very soon. Does my diagnosis of the situation seem reasonable?
As many of our caffe use case is a similar sequential read of a dataset much greater than available RAM, many other caffe users besides me reports a similar issue. They all report their system becoming unresponsive, presumably due to the same thrashing.
"training is freezing for multiple hours" https://github.com/BVLC/caffe/issues/1412
"Caffe memory increases with time(iterations?)" https://github.com/BVLC/caffe/issues/1377
"Random freezes" https://github.com/BVLC/caffe/issues/1807
Is there some option that I missed that can inform lmdb that for a certain read-only transaction is going to be purely sequential, so it shouldn't bother to cache the already read elements? If not, is there a plan to include such a feature?
Or is there an option I can limit the maximum memory a single lmdb transaction is going to use to cache?
Or is there some other possible solution to this problem?
-----
I have been using a hack based on this fork
https://github.com/raingo/lmdb-fork/commit/091ff5e8be35c2f2336e37c0db4c392fa...
to avoid this issue. However, I would love to know if there is any less hacky way to solve this problem.
-----
I have seen this thread
http://www.openldap.org/lists/openldap-devel/201502/msg00052.html
but it looks like it is for multiple readers.
-----
Any advice on this will be much appreciated!
Brian wrote:
Hi, all
I am a caffe user. In my use case, I am reading from a ~300GB lmdb sequentially, reading one element and never accessing again until I read every other element in the db and loops around again. It seems that lmdb will page cache every element. This becomes a problem as the dataset is read in fairly fast and it takes 1 hour before nearly 60% of the RAM is devoted to my lmdb page caches. Then it runs out of unmapped memory to use, so it starts kicking out the page frames of processes of other users, many of which have not been accessed in the last hour. So the system will prefer to kick out those page frames instead of page frames mapped at the beginning of the run. This behavior is entirely understandable but it causes extremely severe thrashing and unresponsive system, as those page frames of other user's comes into use very soon. Does my diagnosis of the situation seem reasonable?
If you're on Linux, you need to set /proc/sys/vm/swappiness to zero.
As many of our caffe use case is a similar sequential read of a dataset much greater than available RAM, many other caffe users besides me reports a similar issue. They all report their system becoming unresponsive, presumably due to the same thrashing.
"training is freezing for multiple hours" https://github.com/BVLC/caffe/issues/1412
Nothing here seems relevant to LMDB.
"Caffe memory increases with time(iterations?)" https://github.com/BVLC/caffe/issues/1377
The poster says his problem occurred with both LMDB and LevelDB.
"Random freezes" https://github.com/BVLC/caffe/issues/1807
Comment says the same problem occurred with LevelDB.
All of those issues appear to be Caffe-specific, not LMDB-specific.
Is there some option that I missed that can inform lmdb that for a certain read-only transaction is going to be purely sequential, so it shouldn't bother to cache the already read elements? If not, is there a plan to include such a feature?
Not at present.
Or is there an option I can limit the maximum memory a single lmdb transaction is going to use to cache?
No, nor will there ever be.
Or is there some other possible solution to this problem?
Read up on how to tune your OS's memory subsystem. There will never be any cache-tuning options in LMDB itself. LMDB relies entirely on the OS cache and it's your responsibility to know how to configure your OS.
I have been using a hack based on this fork
https://github.com/raingo/lmdb-fork/commit/091ff5e8be35c2f2336e37c0db4c392fa...
to avoid this issue. However, I would love to know if there is any less hacky way to solve this problem.
That's ridiculously bad. Using MAP_PRIVATE means LMDB pages will be backed by swap space - this will consume double the resources that it would normally use. There's a reason we only use MAP_SHARED.
I have seen this thread
http://www.openldap.org/lists/openldap-devel/201502/msg00052.html
but it looks like it is for multiple readers.
It would apply to the single-reader case as well. The main point of that thread is to tell the kernel to use a larger readahead value. Again, the free-behind behavior is automatic if your OS is properly tuned, and the kernel's default read-ahead is already 64KB so I don't see much benefit there.
Any advice on this will be much appreciated!
openldap-technical@openldap.org