Re: Antw: Replication through BDB

10 Dec 2016


      Hey,
...
...
Generally the majority can be wrong: Assume you have a
network-failure in a three-node MMR configuration: You update one
node while the other two are unreachable. The communication resumes,
do you expect the change on the none node to be reverted to majority,
or should the majority be updated from the one node that has more
recent data?
Indeed. In syncrepl, "voting" is irrelevant. Changes will be accepted
by any provider node that a client can reach. When connectivity is
restored all nodes will bring each other up to date. In majority-based
voting, you will lose any writes to the minority node, which leaves
you with unresolvable inconsistencies. I.e., data is removed but the
clients believe it was written.
This turns out to be a matter of choice -- I would not go for majority
voting without getting confirmation from a majority about the success of
a transaction, which is what BerkeleyDB does.
What I'm hearing here is that this "formal" approach leads to more
delays, and it doesn't add much in practice -- just the *certainty*
about data having been stored with the quality level assured by
replication.  The certainty comes at a writing delay, and is only of use
when lightning strikes just after a write to one master.
Interestingly, OpenStack Swift takes the same approach -- commit a write
based on local storage, then replicate later.
...
back-hdb and back-bdb both use BerkeleyDB. BerkeleyDB is now
deprecated/obsolete, and LMDB is the default backend.
I'm preparing new installations, so I suppose I will get to see it as
the default.
...
BDB's replication is page-oriented, so it would consume far more
network resources than syncrepl. We have never recommended its use.
It was indeed a design consideration that I was weighing.  I think the
trade-off recommended here is clear, and makes sense.  I don't flush
after every disk use either, after all.
Thanks,
 -Rick

2025

2024

2023

2022

2021

2020

2019

2018

2017

2016

2015

2014

2013

2012

2011

2010

2009

2008

2007

Re: Antw: Replication through BDB