Arto Bendiken wrote:
Here's a curious case that I had not encountered with LMDB as yet previously:
- There was a power reset of a virtual machine with an active LMDB
writer process (standalone use, not OpenLDAP) on an LMDB file containing three sub-DBs.
- After rebooting, the previously-populated LMDB file (~7 GB in size)
appears mostly empty, including when examined with mdb_stat or mdb_dump. Mostly empty meaning that each of three sub-DBs now has only one K/V entry, instead of 7M+ as they used to. In addition, the main DB now indicates six entries instead of the expected three (for the sub-DBs).
- mdb_copy (with or without -c) does not remedy the situation,
producing a mostly (logically) empty database.
This is with LMDB release 0.9.18 running on Ubuntu 14.04.4 (kernel 3.13.0-79-generic) on an ext4 partition (noatime,nodev,nosuid,noexec) on Intel SSD storage in SW RAID-1 configuration.
As mentioned, the LMDB file had three sub-DBs, each with 7M+ entries (as of last backup). No new sub-DBs are created after the database is initially initialized. After initial creation, these three sub-DBs only ever get appended to with new key/value pairs, no code ever deletes or modifies key/value pairs in them. The writer code inserts new entries one at a time, commits the LMDB transaction, and syncs to disk.
I've enclosed mdb_stat output from before/after (before being from a backup, after which numerous more writes had been done). I've also included mdb_dump output of the main DB and three sub-DBs.
The mdb_dump output for the sub-DBs indicates that they each now contain only a single entry (instead of 7M+), that entry being in each case the first key/value pair that was ever inserted into that sub-DB (ages ago).
The mdb_dump output for the main DB is baffling--instead of the three expected entries, or the six that mdb_stat indicates after the reboot, the output includes a multitude of entries--some 2,590. (I've omitted most of them in the attached, but can provide a copy privately.)
What are my options for recovering an LMDB database in this state, to the extent possible? Has anyone else experienced a similar scenario?
Sounds like ext4 or your SSD is messing with you. The only way you could wind up with your original state, 3 sub-DBs with 1 record each, based on the processing you described, is if the original DB pages representing that state were still recorded in the file. There's no way that those pages would not have already been reused by LMDB, after 2210167 transactions had been written to the DB. Much less the 4132159 transactions of your post-reboot file.
So either the SSD has remapped pages out from under you, or the ext4 journal has decided to give you back an older version of the file. In either case I doubt that any of your real data is still accessible thru any standard filesystem APIs.
If this filesystem is only being used to store LMDB data, you should use ext2 (or some other non-journaling filesystem of your choice). If all your txns are being committed synchronously, you should consider using a raw block device instead of a filesystem. (Code for this is experimental and not yet released. It's slower than using a filesystem, when using asynch transactions, but several times faster than any other filesystem for synch transactions.)
Thanks, Arto