Re: large write amplification

7 May 2015

      Shu, Xinxin wrote:
...
For overwrite case, I checked height of BTree of my lmdb database, Height
is
4, so for "one byte" page update, there should be 4 pages update, plus one
meta page update, write amplification should be 5 rather than ~9, let me know
if I missed something?
There are two Btrees to update - the user data and the freeDB data.
Please read http://symas.com/mdb/#pubs rather than spending time asking 
questions that are already fully documented.
...
by the way, how can I get the degree of B-Tree of lmdb database?
There is no such thing. Or, it is entirely variable.
...
Cheers,
xinxin
-----Original Message-----
From: Леонид Юрьев [mailto:leo@yuriev.ru]
Sent: Tuesday, May 05, 2015 6:16 PM
To: Shu, Xinxin
Cc: openldap-technical@openldap.org
Subject: Re: large write amplification
Hm, ANY change needs a btree-update.
Let have a item key=K, data=A.
Then overwrite A to B, so now key=K, data=B.
This is a simply "one byte" change, but a few db-pages need to be cloned and updated:

a page, which contains the data=B and records around.
a page in b-tree, that holds a pointer/reference to a page, which contains data=B and records around.
all "leaf-to-root path in btree" pages, related to a new page in btree, that holds a pointer/reference to a page, which contains data=B and records around.
...
a new root-pages of mainDB and freeDB.
a point to "new root" in meta-page, that lay in the house that Jack built ;)

So, by design LMDB is optimized for highload reading, but not for writes.
Leonid.
2015-05-05 10:26 GMT+03:00 Shu, Xinxin xinxin.shu@intel.com:
...
Hi leonid,
Thanks for your reply, I observed another scenario , I also tested
"overwrite mode", I slightly modify source code to change default
behavior (set dbflags_ = SYNC, flush data to disk once transaction is
committed ), also collected iostat , the overwrite IOPS is ~ 521
ops/sec , but iostat show that w/s is ~ 4666,  the write amplification
is ~9,  to my understanding, overwriting exist value does not adjust
btree,  why write amplification is so large, could you help explain ?
thanks
Cheers,
xinxin
-----Original Message-----
From: Леонид Юрьев [mailto:leo@yuriev.ru]
Sent: Monday, May 04, 2015 6:59 PM
To: Shu, Xinxin
Cc: openldap-technical@openldap.org
Subject: Re: large write amplification
Hi, Xinxin.
I will try to answer briefly, without a details:

To allow readers be never blocked by a writer, LMDB provides a snapshot of data, indexes and directory for each completed transaction.

Most of a db-pages (which is not changed by a particular

transaction) are "shared" between such snapshots. But any changes of data itself and reflection to btree-indexes (include a particular table, free-db, main-db and so forth) require a new pages to be used and written to the disk.

In a large db a small "one-byte" change may make "dirty" a lot of db-pages (usualy 4K each). For example, one add/del/mod operation in LDAP-db with size of few GB,  requires about 50-100 page-level IOPS.

Leonid.
P.S.
For highload uses-cases I made a few changes in our fork of OpenLDAP/LMDB.
A one of these features we called "LIFO reclaiming".
It give us 10-50 times performance boost, especially by engaging benefits of write-back cache of storage subsystem.
Nowadays we used it in our production (telco) environment.
But currently ones is not safe for all cases, see
https://github.com/ReOpen/ReOpenLDAP/issues/2 and https://github.com/ReOpen/ReOpenLDAP/issues/1.
2015-05-04 5:31 GMT+03:00 Shu, Xinxin xinxin.shu@intel.com:
...
Hi list,
Recently I run micro tests on LMDB on DC3700 (200GB), I use bench
code https://github.com/hyc/leveldb/tree/benches ,  I tested  fillrandsync mode and collected iostat data, found that write amplification is large For fillrandsync case:
IOPS : 1020 ops/sec
Iostat data shows that w/s on that SSD is 8093, and avgqu-sz is ~ 1,
await time is about 0.16 ms,  so the write amplification is ~8, which
is large to me, can someone help explain why write amplification is
so large? thanks
Cheers,
xinxin
-- 
   -- Howard Chu
   CTO, Symas Corp.           http://www.symas.com
   Director, Highland Sun     http://highlandsun.com/hyc/
   Chief Architect, OpenLDAP  http://www.openldap.org/project/

2025

2024

2023

2022

2021

2020

2019

2018

2017

2016

2015

2014

2013

2012

2011

2010

2009

2008

2007

Re: large write amplification