This post outlines a few changes to LMDB I had to do to make it work in a specific use case. I’d like to see those changes upstream, but I understand that they may be/are not relevant for e.g. OpenLDAP. The use case is multiple databases on disks with long running large write transactions.
1. Option to not use custom memory allocator/page pool
LMDB has a custom malloc() implementation that re-uses pages (me_dpages). I understand that this improves the performance at bit (depending on the malloc implementation). But there should at least be the option to not do that (for many reasons). I would even make not using it the default.
2. Large transactions and spilling
In a large write transaction, it will use a lot of memory per default (512MiB) which won’t get freed when the transaction commits (see 1.). If one has a lot of databases it uses a lot of memory that never gets freed.
Alternatively, one can use MDB_WRITEMAP, but (i) per default Linux isn’t tuned to delay writing pages to disk and (ii) before commit LMDB has to remove a dirty bit, so each page is written twice.
Both problems would be fixed by making when pages get spilled configurable (mt_dirty_room as MDB_IDL_UM_MAX currently) and reducing the default non-spill memory amount for at least the MDB_WRITEMAP case. If this memory amount is low mt_spill_pgs gets sorted often so maybe this needs to be converted to a different data structure (e.g. red-black tree).
3. LMDB causes crashes if database is corrupted
If the database is corrupted it can cause the application to crash. I have fixed those cases when they (randomly) occurred. Properly fixing this would probably be best done with some fuzzing.
4. Allow LMDB to reside on a device
I used dm-cache to improve LMDB read performance. It needed a bit of adjustment to get the correct size of the device via ioctl BLKGETSIZE64.
--
I’ve fixed those issues w.r.t. my application. If there is interest in any of those application specific changes, I’ll clean them up and post them.