Re: large write amplification

5 May 2015

      Леонид Юрьев wrote:
...
Hi, Xinxin.
I will try to answer briefly, without a details:

To allow readers be never blocked by a writer, LMDB provides a

snapshot of data, indexes and directory for each completed
transaction.

Most of a db-pages (which is not changed by a particular

transaction) are "shared" between such snapshots. But any changes of
data itself and reflection to btree-indexes (include a particular
table, free-db, main-db and so forth) require a new pages to be used
and written to the disk.

In a large db a small "one-byte" change may make "dirty" a lot of

db-pages (usualy 4K each). For example, one add/del/mod operation in
LDAP-db with size of few GB,  requires about 50-100 page-level IOPS.
Correct, up to this last point. The degree of amplification is greatly 
overstated.
See http://symas/com/mdb/ondisk/
The number of pages touched depends on the height of the B+tree, which 
is O(logN) of the number of records. Even a tree of multiple terabytes 
is unlikely to reach beyond a height of 5.
The minimum write amplification may be on the order of 8 pages for a 
trivial write. But it also tends to be the maximum write amplification too.
...
Leonid.
P.S.
For highload uses-cases I made a few changes in our fork of OpenLDAP/LMDB.
A one of these features we called "LIFO reclaiming".
It give us 10-50 times performance boost, especially by engaging
benefits of write-back cache of storage subsystem.
Nowadays we used it in our production (telco) environment.
But currently ones is not safe for all cases, see
https://github.com/ReOpen/ReOpenLDAP/issues/2 and
https://github.com/ReOpen/ReOpenLDAP/issues/1.
The LIFO approach inherently breaks the safety guarantees of the LMDB 
concurrency design, as I have already explained.
-- 
   -- Howard Chu
   CTO, Symas Corp.           http://www.symas.com
   Director, Highland Sun     http://highlandsun.com/hyc/
   Chief Architect, OpenLDAP  http://www.openldap.org/project/

2025

2024

2023

2022

2021

2020

2019

2018

2017

2016

2015

2014

2013

2012

2011

2010

2009

2008

2007

Re: large write amplification