Alec Matusis wrote:
We have an environment with no flags that contains a database with no flags. The database is append only, no deletions or modifications. It is written using a single RW transaction, in the absence of any RO transactions. We observe that when we commit and recreate the RW transaction every 2000 insertion ops, the data.mdb file size on disk is 2x larger than when committing every 64000 insertion ops. The mdb_copy c utility shrinks the large 2k ops commit file to almost the same file size as the 64k commit one. mdb_stat e on the data.mdb shows that when we have more commits and bigger file, we have more pages used by the same proportion.
In production we will have several large DBs (>1TB) on an NVMe card and we do not have the 2x space for periodic mdb_copy c compactifications (and we cannot stop the writing process). We also need to commit every 2000 write ops, because there will be short-lived RO transactions that need to see the DB updates every 2000 writes.
1. Why is the file size on disk dependent on the commit frequency? (I suppose because with less frequent commits it can allocate data between pages more efficiently)?
LMDB does copy-on-write. Every time you start a new transaction, any page you modify must be copied first. If you do many operations in the same transaction, the modified pages can be reused as-is, instead of needing to be copied again.
2. What can we do to reduce data.mdb, if we must commit frequently? Can we use any environment, transaction or db flags, or anything else?
If it is truly, strictly append-only use, which means every newly inserted key is greater than all existing keys, then you should use the MDB_APPEND flag. That will cut growth by half.
We are on Linux 5.4.0 / ext4 fs. The DB that grows 2x faster with more frequent commits has bytearr key -> u32 val structure (the byterarray key is between 31 and 36 bytes). Another DB that has a reverse u32 key -> bytearr structure oonly grows 10% larger in the more frequent commits regime.