martin@urbackup.org wrote:
This post outlines a few changes to LMDB I had to do to make it work in a specific use case. I’d like to see those changes upstream, but I understand that they may be/are not relevant for e.g. OpenLDAP. The use case is multiple databases on disks with long running large write transactions.
- Option to not use custom memory allocator/page pool
LMDB has a custom malloc() implementation that re-uses pages (me_dpages). I understand that this improves the performance at bit (depending on the malloc implementation). But there should at least be the option to not do that (for many reasons). I would even make not using it the default.
Not going to happen. But maybe it would be reasonable to allow configuring a limit on how many pages it keeps hanging around, before actually using libc free() on them.
- Large transactions and spilling
In a large write transaction, it will use a lot of memory per default (512MiB) which won’t get freed when the transaction commits (see 1.). If one has a lot of databases it uses a lot of memory that never gets freed.
Alternatively, one can use MDB_WRITEMAP, but (i) per default Linux isn’t tuned to delay writing pages to disk and (ii) before commit LMDB has to remove a dirty bit, so each page is written twice.
There is no more dirty bit in LMDB 1.0, and this double-write no longer happens.
- LMDB causes crashes if database is corrupted
You can enable per-page checksums in LMDB 1.0, in which case you'll just get an error code if a page is corrupted (and the checksum fails to match). The DB will still be unusable if anything is corrupted.
- Allow LMDB to reside on a device
LMDB 1.0 supports storage on raw devices.