https://bugs.openldap.org/show_bug.cgi?id=9027
--- Comment #4 from Howard Chu hyc@openldap.org --- (In reply to doug from comment #3)
Hi! I would like to voice my support for this feature, and explain how this API would help me.
Sorry but I don't find any of these arguments to be compelling.
First of all, a bit of background: I'm maintaining C++ bindings for LMDB (https://github.com/hoytech/lmdbxx), and have been a contributor to the perl bindings and others. I have published two interface layers that use LMDB, https://github.com/hoytech/quadrable and https://github.com/hoytech/rasgueadb and as my colleagues would attest, am a relentless LMDB advocate.
Application 1: Testing zero-copy
That previous link, RasgueaDB, is an indexing and query layer for LMDB. It uses flatbuffers for its data format, which allows zero-copy access to fields. Since there are several layers involved, I wanted to make sure that no copying was happening, and also to add a test to the test-suite to make sure this remains the case.
To do this, I have a library called assert_zerocopy: https://github.com/hoytech/hoytech-cpp/blob/master/hoytech/assert_zerocopy.h (and also a perl equivalent here: https://metacpan.org/pod/Test::ZeroCopy )
However, in order to verify the returned value is actually a zero-copy reference, I need the pointer to the memory map (and the map's size). I have added a terrible hack to the lmdbxx bindings to retrieve this:
https://github.com/hoytech/lmdbxx/blob/ 08eddafcc4613c7fc8ebd88f5db87c7d7bfb9f52/lmdb%2B%2B.h#L1115-L1118
But this is obviously not a good approach (as mentioned by Nic in this thread) and it would be much better if there was an API to retrieve this value (like there is for me_mapsize).
No. The only thing you need to do, to assert that no copying has occurred, is to also retrieve the record thru the standard LMDB API, and compare value pointers. If they are the same, then no copying has occurred.
At any rate, this sounds like only a debugging feature, with no actual production use.
Application 2: Virtual memory controls
In general I agree that mlock()ing large files like databases is counter-productive. However, there are other controls that can be applied to the virtual memory of an application.
For example, rsync attempts to preserve the filesystem cache state so that rsync invocations have as small an impact as possible on the cache (since it's a shared resource): https://insights.oetiker.ch/linux/fadvise/
Another example, before restarting a server, Instagram uses my utility vmtouch to snapshot the virtual memory state (basically the set of "hot" files). After rebooting, the state is restored before the server is added back to the active pool of servers.
I have generalised this with my application/library vmprobe: https://vmprobe.com/filesystem-cache
If I had access to me_map within my application, I would be able to integrate libvmprobe more easily and portably. For instance, this would allow me to take a snapshot of which pages are resident in memory, perform some large read query that touches many pages, then restore the original VM page residency set after the query completes.
Another variant of this: If you have multiple customers who use the same server (at different times) then you can restore a customer's previous VM state when they login. If you know approximately the pages that will be accessed in advance, sequentially pre-paging them in can have significant performance benefits. I did some experiments with Postgres pre-paging you can see here: https://vmprobe.com/database-speedup
If you want control over the address that is mmap'd just use MDB_FIXEDMAP and pass in your desired address. The actions you're talking about here are highly system-dependent, you may as well just read /proc/<PID>/maps yourself and work from there. Again, this is a pretty niche case.
Conclusion
Although certainly a user could abuse me_map if it were exposed, one of the reasons why I enjoy LMDB so much is that it generally assumes its users know what they are doing and doesn't compromise on flexibility or performance for the sake of users who don't. For example, although it is not recommended to use MDB_WRITEMAP in most cases, it nevertheless exists.
As explained above, I have several use-cases for me_map that I feel are legitimate. I am already accessing it in a hacky way but naturally would prefer a more portable and future-proof API.
If this API is added, the documentation should of course make clear that accessing or modifying (if MDB_WRITEMAP) data through me_map is not supported in any way, and regular use-cases have no need for this value and should stick to the supported APIs.
Thank you!