Le 5/5/14 1:46 PM, Hallvard Breien Furuseth a écrit :
On 05/02/2014 11:45 AM, Emmanuel Lécharny wrote:
I don't know what's inside the Free List in LMDB, but I can only assume it's closed to what we have in Mavibot. Let me explain what we do (or will do !) in Mavibot, and what data structure we use for that.
Thanks.
(...) I don't know how it fits with LMDB implementation, I just wanted to give you an insight on what we are working on. Hope it helps.
LMDB's freeDB is simpler: A mapping {ID of transaction which freed the pages => [page numbers]}.
We used a tree because we need to be able to find fast the pages copied by a given transaction for a specific BTree. If we are to look for any older transaction belonging to the same btree, {ID, page numbers} is not enough.
A write transaction may only use page numbers from freeDB records older than the oldest snapshot.
Yes, this is the current state. In Mavibot, we distinguish the elements in the copiedPagesBTree with the pages in teh freePages list (which is a simple linked list). The rational is that it's less costly to grap a page from a simple linked list than from a b-tree, assuming that some pages will be freed without having to be put into the copiedPagesBtree : for instance, the root page can always be released if there is no read transaction using the N-1 version, or the pages used by the b-tree headers (same reason).
At this point, those 'optimisation' are quite pulled out of thin air, I must admit.
It scans the reader table for that: Each read-only txn has a slot where it puts its snapshot's txnID.
W edo have the same thing in memory.
Also the datafile has metapages for the last 2 snapshots.
We have two pointers in the RecordManager header (RMH), one on the current BtreeOfBtrees and CopiedPagesBtree, and one for the previous version of those guys.
We always keep a track to the previous pointers until we have done the cleanup, then we rewrite the RecordManager header with only the current version. If we have a crash before the first RMH write, we restart with the current version (which is the version before we updated whatever btrees). If the crash occurs in the middle of the two RMH writes, then at startup, we discover that we have the previous and current pointer, so we know that we have to restart the cleanup Otherwise, we are safe with the new version. In any case, we are in a state we can restart with no breakage, except that we may have lost the latest modification, and in a minimal time.
I do think that LMDB and Mavibot are quite similar in the base algorithm (there is not much room for many smart and different approaches here ;), but the implementation *is* different, due to the fact that we are in Java, when LMDB is in C. Typically, serialization/deserialization is a real burden for us, so is the cache. We do'nt use MemoryMappedFile atm, but we could.
There are no concurrent write txns.