Re: (ITS#8475) Feature request: MDB low durability transactions

6 Aug 2016


      On 06. aug. 2016 17:38, bentrask@comcast.net wrote:
...
Transaction commits are one of the few bottlenecks in MDB, because it has to
fsync twice, sequentially.
I think MDB could support mixed low and high durability transactions in the same
database by adding per-page checksums and a third root page. The idea is that
when committing a low-durability transaction, no fsyncs are performed. (...)
Yesno.  We can get rid of fsyncs, but not that way.  Checksumming each
page isn't enough.  We must know it's the right version of the page and
not e.g. a similar page from a previous aborted transaction.  To commit
a branch or meta page, we'd need to scan its children and checksum the
page headers (thus including their checksum) of each.  Expensive.
IIRC there are three things we can do:
- Use and fsync a WAL (write-ahead log) instead of the database pages.
   That can be cheaper because it writes one contiguous region instead
   of a lot of random-access pages.  Requires recovery after a crash.
- Volatile metapages which mdb_env_open() _always_ throws away if no
   other environment is already open.  They are lost of the application
   crashes/exits without doing a final checkpoint.
- Improve that a bit: Put them in a shared memory region, since that
   won't survive a system crash (unlike if we put them in the lockfile).
   That way they'll survive application crash provided something does
   a checkpoint before next system crash.
We've discussed these sometimes and there are caveats for some of them,
I don't quite remember.  One issue is that a "system crash" isn't the
only thing which can lose unsynced pages.  Another is unmounting and
re-mounting the disk (i.e. an USB disk).
-- 
Hallvard

2025

2024

2023

2022

2021

2020

2019

2018

2017

2016

2015

2014

2013

2012

2011

2010

2009

2008

2007

2006

Re: (ITS#8475) Feature request: MDB low durability transactions