Re: data.mdb grows faster with more frequent commits

10 Mar 2021


      Alec Matusis wrote:
...
We have an environment with no flags that contains a database with no flags. The database is append only, no deletions or modifications. It is written using a
single RW transaction, in the absence of any RO transactions. We observe that when we commit and recreate the RW transaction every 2000 insertion ops, the
data.mdb file size on disk is 2x larger than when committing every 64000  insertion ops. The mdb_copy c utility shrinks the large 2k ops commit file to almost
the same file size as the 64k commit one. mdb_stat e on the data.mdb shows that  when we have more commits and bigger file, we have more pages used by the same
proportion.
In production we will have several large DBs (>1TB) on an NVMe card and we do not have the 2x space for periodic mdb_copy c compactifications (and we cannot
stop the writing process). We also need to commit every 2000 write ops, because there will be short-lived RO transactions that need to see the DB updates every
2000 writes.
1.  Why is the file size on disk dependent on the commit frequency? (I suppose because with less frequent commits it can allocate data between pages more
efficiently)?
LMDB does copy-on-write. Every time you start a new transaction, any page you modify must be copied first.
If you do many operations in the same transaction, the modified pages can be reused as-is, instead of needing
to be copied again.
...
2.  What can we do to reduce data.mdb, if we must commit frequently? Can we use any environment, transaction or db flags, or anything else?
If it is truly, strictly append-only use, which means every newly inserted key is greater than all
existing keys, then you should use the MDB_APPEND flag. That will cut growth by half.
...
We are on Linux 5.4.0 / ext4 fs. The DB that grows 2x faster with more frequent commits has bytearr key -> u32 val structure (the byterarray key is between 31
and 36 bytes). Another DB that has a reverse u32 key -> bytearr structure oonly grows 10% larger in the more frequent commits regime.
-- 
  -- Howard Chu
  CTO, Symas Corp.           http://www.symas.com
  Director, Highland Sun     http://highlandsun.com/hyc/
  Chief Architect, OpenLDAP  http://www.openldap.org/project/

2025

2024

2023

2022

2021

2020

2019

2018

2017

2016

2015

2014

2013

2012

2011

2010

2009

2008

2007

Re: data.mdb grows faster with more frequent commits