On 08/08/2016 05:41 AM, Hallvard Breien Furuseth wrote:
> A transaction must not reuse data pages visible in the last snapshot
> known to be durable, since that's how far back LDMB may need to revert
> after abnormal termination. Like a crash after MDB_NOMETASYNC may do.
>
> Sync the data pages from a txn, write the metapage, eventually sync
> that metapage, wait out any older read-only transactions, and *then*
> you can reuse the pages the txn freed. Not before. So when you don't
> sync, or a read-only txn won't die, LMDB degenerates to append-only.
>
> ...except if you sync the metapage and exit, next LMDB run may not
> know you synced it and must assume the metapage isn't yet durable.
> So it might not reuse pages visible to the _previous_ durable
> metapage, until it syncs. I'm rather losing track at this point,
> but I think it may mean twice as may not-yet-usable pages as one
> might expect.
Concretely: say the current write transaction is number 10, and a
long-lived reader is on number 7. Currently, MDB will be unable to reuse
any pages used in transactions 7+ until the reader ends.
Now say a 3rd, durable root is added. For the sake of argument, no
checksums are used and in the event of a crash, only the last durable
state is recovered. Say the durable transaction is number 2. Pages used
in transaction 2 need to be preserved, obviously. 7+ still need to be
preserved for the slow reader. But pages from transactions 3-6 can be
reused.
Note that the last durable transaction is controlled purely by the
single writer, so tracking it is actually easier than tracking which
readers are where.
If a crash happens before a durable root is fully synced, then there
should be a second, older durable root that hasn't been reused yet. In
that case MDB recovers the way it does currently.
Does this make sense? Thanks for bearing with me.