A follow-up question:
We have an environment with one database with MDB_INTEGERKEY | MDB_DUPSORT
flags.
The keys and values are never deleted or modified (append-only).
There's a single RW transaction on the environment, that is writing the
data. It also needs to read some prior data in order to write new data. It
is committed and then recreated after every 1000 to 3000 mdb_cursor_put()
ops. All keys and values in the mdb_cursor_put ops are about the same size.
I am printing and plotting mdb_stat after every commit.
I have observed a sudden Free Pages jump as shown on the top graph here:
https://imgur.com/a/RZi1Vjn The ratio of Free/Used pages has similar jump on
the second graph. The number of Used pages(computed as
MDB_envinfo.me_last_pgno - free_pages or equivalently " Number of pages
used" - " Free pages" from mdb_stat -fe utility) just grown monotonously
on
the third graph, as expected.
We commit about the same data each time(same size * same number of put ops).
What might cause this jump?
We have other databases, but they do not have this behavior- the ratio of
free/used pages is more or less constant after each commit, oscillating
around 0.001.
-----Original Message-----
From: Howard Chu [mailto:hyc@symas.com]
Sent: Wednesday, March 10, 2021 8:12 AM
To: Alec Matusis <matusis(a)matusis.com>; openldap-technical(a)openldap.org
Subject: Re: data.mdb grows faster with more frequent commits
Alec Matusis wrote:
We have an environment with no flags that contains a database with no
flags. The database is append only, no deletions or modifications. It
is written using a single RW transaction, in the absence of any RO
transactions. We observe that when we commit and recreate the RW
transaction every 2000 insertion ops, the data.mdb file size on disk is 2x
larger
than when committing every 64000 insertion ops. The mdb_copy c
utility shrinks the large 2k ops commit file to almost the same file size as
the 64k commit one. mdb_stat e on the data.mdb shows that when we have
more commits and bigger file, we have more pages used by the same
proportion.
In production we will have several large DBs (>1TB) on an NVMe card
and we do not have the 2x space for periodic mdb_copy c
compactifications (and we cannot stop the writing process). We also
need to commit every 2000 write ops, because there will be short-lived
RO transactions that need to see the DB updates every
2000 writes.
1. Why is the file size on disk dependent on the commit frequency? (I
suppose because with less frequent commits it can allocate data between
pages more
efficiently)?
LMDB does copy-on-write. Every time you start a new transaction, any page
you modify must be copied first.
If you do many operations in the same transaction, the modified pages can be
reused as-is, instead of needing to be copied again.
2. What can we do to reduce data.mdb, if we must commit frequently?
Can
we use any environment, transaction or db flags, or anything else?
If it is truly, strictly append-only use, which means every newly inserted
key is greater than all existing keys, then you should use the MDB_APPEND
flag. That will cut growth by half.
We are on Linux 5.4.0 / ext4 fs. The DB that grows 2x faster with
more
frequent commits has bytearr key -> u32 val structure (the byterarray key
is
between 31 and 36 bytes). Another DB that has a reverse u32 key ->
bytearr structure oonly grows 10% larger in the more frequent commits
regime.
--
-- Howard Chu
CTO, Symas Corp.
http://www.symas.com
Director, Highland Sun
http://highlandsun.com/hyc/
Chief Architect, OpenLDAP
http://www.openldap.org/project/