Thanks! Looks like my specific issues are well on the radar.
On 22.10.2020 20:40 Howard Chu wrote:
martin@urbackup.org wrote:
This post outlines a few changes to LMDB I had to do to make it work in a specific use case. I’d like to see those changes upstream, but I understand that they may be/are not relevant for e.g. OpenLDAP. The use case is multiple databases on disks with long running large write transactions.
- Option to not use custom memory allocator/page pool
LMDB has a custom malloc() implementation that re-uses pages (me_dpages). I understand that this improves the performance at bit (depending on the malloc implementation). But there should at least be the option to not do that (for many reasons). I would even make not using it the default.
Not going to happen. But maybe it would be reasonable to allow configuring a limit on how many pages it keeps hanging around, before actually using libc free() on them.
A limit would work.
- Large transactions and spilling
In a large write transaction, it will use a lot of memory per default (512MiB) which won’t get freed when the transaction commits (see 1.). If one has a lot of databases it uses a lot of memory that never gets freed.
Alternatively, one can use MDB_WRITEMAP, but (i) per default Linux isn’t tuned to delay writing pages to disk and (ii) before commit LMDB has to remove a dirty bit, so each page is written twice.
There is no more dirty bit in LMDB 1.0, and this double-write no longer happens.
That'd improve the MDB_WRITEMAP case. The general problem there is that tuning of the write-back behaviour is system wide (e.g. vm.dirty_expire_centisecs on Linux) and mostly not in control of the application, so if one wants to be sure that writeback happens only once sufficient changes have accumulated, one needs to not use MDB_WRITEMAP. In that case it would be great to be able to control the memory usage/write-back behaviour by configuring the amount at which it spills, as well.
- LMDB causes crashes if database is corrupted
You can enable per-page checksums in LMDB 1.0, in which case you'll just get an error code if a page is corrupted (and the checksum fails to match). The DB will still be unusable if anything is corrupted.
That would fix the problem properly. Does it check that it is the correct transaction as well (e.g. by putting a transid into the page like btrfs)? Returning wrong results or MDB_CORRUPTED is something my application can handle (but not crashes obviously).