Hi Howard,
On Mon, Mar 14, 2016 at 6:10 PM, Howard Chu hyc@symas.com wrote:
Sounds like ext4 or your SSD is messing with you. The only way you could wind up with your original state, 3 sub-DBs with 1 record each, based on the processing you described, is if the original DB pages representing that state were still recorded in the file. There's no way that those pages would not have already been reused by LMDB, after 2210167 transactions had been written to the DB. Much less the 4132159 transactions of your post-reboot file.
So either the SSD has remapped pages out from under you, or the ext4 journal has decided to give you back an older version of the file. In either case I doubt that any of your real data is still accessible thru any standard filesystem APIs.
Thank you for the very quick response. LMDB has been rock solid in this use for some time now, so I'd also be inclined to look to ext4 first, and will need to examine their laundry list of mount options [1] again. This particular file system was mounted 'noatime,nodev,nosuid,noexec', which means it should default to 'data=ordered' for the journaling.
In case anyone on the list has informed views regarding ext4 mount options to avoid or embrace when using LMDB, certainly would love to hear them...
As for the present problem, from further analysis what I suspect to be the reason those sub-DBs each contain their lone keys is that those three keys (one in each sub-DB) are the defaults inserted when the program initializes a new LMDB file to write to. Those three keys are inserted if the database is believed to be empty, i.e., if it has zero entries.
So possibly the post-reboot state of the DB file looked empty to LMDB, either for the main DB or for the three sub-DBs, and the program then proceeded to insert and commit those default keys. That's more plausible than a resurrection of the exact initial state (from months ago) where the original pages would indeed be long gone, as you noted.
I don't suppose there is much hope of finding the previous B+tree root(s) in the .mdb file and attempting to recover data from any still-reachable leaves?
If this filesystem is only being used to store LMDB data, you should use ext2 (or some other non-journaling filesystem of your choice). If all your txns are being committed synchronously, you should consider using a raw block device instead of a filesystem. (Code for this is experimental and not yet released. It's slower than using a filesystem, when using asynch transactions, but several times faster than any other filesystem for synch transactions.)
Thanks for those tips, will evaluate LMDB on ext2. Would also be most interested to help test the new code for LMDB on a raw block device. Might that perhaps be available in some Git branch going forward?
[1] https://www.kernel.org/doc/Documentation/filesystems/ext4.txt