Re: large write amplification

4 May 2015


      Hi, Xinxin.
I will try to answer briefly, without a details:
- To allow readers be never blocked by a writer, LMDB provides a
snapshot of data, indexes and directory for each completed
transaction.
- Most of a db-pages (which is not changed by a particular
transaction) are "shared" between such snapshots. But any changes of
data itself and reflection to btree-indexes (include a particular
table, free-db, main-db and so forth) require a new pages to be used
and written to the disk.
- In a large db a small "one-byte" change may make "dirty" a lot of
db-pages (usualy 4K each). For example, one add/del/mod operation in
LDAP-db with size of few GB,  requires about 50-100 page-level IOPS.
Leonid.
P.S.
For highload uses-cases I made a few changes in our fork of OpenLDAP/LMDB.
A one of these features we called "LIFO reclaiming".
It give us 10-50 times performance boost, especially by engaging
benefits of write-back cache of storage subsystem.
Nowadays we used it in our production (telco) environment.
But currently ones is not safe for all cases, see
https://github.com/ReOpen/ReOpenLDAP/issues/2 and
https://github.com/ReOpen/ReOpenLDAP/issues/1.
2015-05-04 5:31 GMT+03:00 Shu, Xinxin xinxin.shu@intel.com:
...
Hi list,
Recently I run micro tests on LMDB on DC3700 (200GB), I use bench code https://github.com/hyc/leveldb/tree/benches ,  I tested  fillrandsync mode and collected iostat data, found that write amplification is large
For fillrandsync case:
IOPS : 1020 ops/sec
Iostat data shows that w/s on that SSD is 8093, and avgqu-sz is ~ 1, await time is about 0.16 ms,  so the write amplification is ~8, which is large to me, can someone help explain why write amplification is so large? thanks
Cheers,
xinxin

2025

2024

2023

2022

2021

2020

2019

2018

2017

2016

2015

2014

2013

2012

2011

2010

2009

2008

2007

Re: large write amplification