David Barbour wrote:
I wonder if an intermediate variation on this idea might be easier to implement:
We can describe a subset of MDB_RDONLY transactions as 'long running', e.g. when copying the database. Normal read-only transactions might then be called 'ephemeral'. We might compute long-running transactions heuristically, e.g. ((currentTxnID - rdOnlyTxnId) > 100) and/or we might allow developers to mark them.
Interesting. I think, rather than picking an arbitrary age, we could simply look for the largest gap between two reader txnIDs and work from there. As long as the gap is >2 then there should be some pages to recover.
When deciding which transactions to free, we compute three numbers with a one-pass scan of the reader lock table:
txnO = the oldest active transaction txnE = the oldest active ephemeral transaction txnL = the newest active long-running transaction
Normally, LMDB just computes txnO, and conservatively assumes that readers might be operating anywhere between txnO and the current transaction. Emmanuel's proposal for precise analysis is somewhat daunting. But I think we can use this simple three-point analysis to find some sweet spots between precise and conservative.
We know by definition that txnO <= txnL, and txnO <= txnE.
Under normal conditions, i.e. assuming that long-running transactions are relatively infrequent, we also know that txnL < txnE will be true more frequently than not. Conservatively, we must protect the ranges from [txnO,txnL] and [txnE,currentTxnId]. But in the common case, there will be a non-empty range of transactions (txnL,txnE) that can be GC'd. And of course anything below txnO can still be GC'd.
I'm not sure whether or not we can easily identify which pages are used only within that gap, and may thus be collected. I haven't studied that aspect of LMDB yet. But supposing that aspect is addressed, this scheme should greatly mitigate the impact of infrequent long-running read-only transactions. A lot more space from recent write transactions would be reusable.
Yes, finding that gap is the tricky part now. I'm guessing we'll need a freeDB format change to have enough info to do it, though maybe not. Currently we record that in txn X, some list of pageIDs was freed. In LMDB 1.0 we will also have per-page txnID stamps, to tell us when a page was written.
Preserving the range from [txnO,txnL] pretty much means we cannot reuse any pages up to txnL's free list. I.e., the page's stamp must be > txnL, it must have been written at a point no txn in [txnO,txnL] could have seen it. In [txnL+1,txnE] we can reuse all the free pages whose stamp is
txnL. Perhaps we don't need a freeDB change.