Here's a curious case that I had not encountered with LMDB as yet previously:
1. There was a power reset of a virtual machine with an active LMDB writer process (standalone use, not OpenLDAP) on an LMDB file containing three sub-DBs.
2. After rebooting, the previously-populated LMDB file (~7 GB in size) appears mostly empty, including when examined with mdb_stat or mdb_dump. Mostly empty meaning that each of three sub-DBs now has only one K/V entry, instead of 7M+ as they used to. In addition, the main DB now indicates six entries instead of the expected three (for the sub-DBs).
3. mdb_copy (with or without -c) does not remedy the situation, producing a mostly (logically) empty database.
This is with LMDB release 0.9.18 running on Ubuntu 14.04.4 (kernel 3.13.0-79-generic) on an ext4 partition (noatime,nodev,nosuid,noexec) on Intel SSD storage in SW RAID-1 configuration.
As mentioned, the LMDB file had three sub-DBs, each with 7M+ entries (as of last backup). No new sub-DBs are created after the database is initially initialized. After initial creation, these three sub-DBs only ever get appended to with new key/value pairs, no code ever deletes or modifies key/value pairs in them. The writer code inserts new entries one at a time, commits the LMDB transaction, and syncs to disk.
I've enclosed mdb_stat output from before/after (before being from a backup, after which numerous more writes had been done). I've also included mdb_dump output of the main DB and three sub-DBs.
The mdb_dump output for the sub-DBs indicates that they each now contain only a single entry (instead of 7M+), that entry being in each case the first key/value pair that was ever inserted into that sub-DB (ages ago).
The mdb_dump output for the main DB is baffling--instead of the three expected entries, or the six that mdb_stat indicates after the reboot, the output includes a multitude of entries--some 2,590. (I've omitted most of them in the attached, but can provide a copy privately.)
What are my options for recovering an LMDB database in this state, to the extent possible? Has anyone else experienced a similar scenario?
Thanks, Arto
Arto Bendiken wrote:
Here's a curious case that I had not encountered with LMDB as yet previously:
- There was a power reset of a virtual machine with an active LMDB
writer process (standalone use, not OpenLDAP) on an LMDB file containing three sub-DBs.
- After rebooting, the previously-populated LMDB file (~7 GB in size)
appears mostly empty, including when examined with mdb_stat or mdb_dump. Mostly empty meaning that each of three sub-DBs now has only one K/V entry, instead of 7M+ as they used to. In addition, the main DB now indicates six entries instead of the expected three (for the sub-DBs).
- mdb_copy (with or without -c) does not remedy the situation,
producing a mostly (logically) empty database.
This is with LMDB release 0.9.18 running on Ubuntu 14.04.4 (kernel 3.13.0-79-generic) on an ext4 partition (noatime,nodev,nosuid,noexec) on Intel SSD storage in SW RAID-1 configuration.
As mentioned, the LMDB file had three sub-DBs, each with 7M+ entries (as of last backup). No new sub-DBs are created after the database is initially initialized. After initial creation, these three sub-DBs only ever get appended to with new key/value pairs, no code ever deletes or modifies key/value pairs in them. The writer code inserts new entries one at a time, commits the LMDB transaction, and syncs to disk.
I've enclosed mdb_stat output from before/after (before being from a backup, after which numerous more writes had been done). I've also included mdb_dump output of the main DB and three sub-DBs.
The mdb_dump output for the sub-DBs indicates that they each now contain only a single entry (instead of 7M+), that entry being in each case the first key/value pair that was ever inserted into that sub-DB (ages ago).
The mdb_dump output for the main DB is baffling--instead of the three expected entries, or the six that mdb_stat indicates after the reboot, the output includes a multitude of entries--some 2,590. (I've omitted most of them in the attached, but can provide a copy privately.)
What are my options for recovering an LMDB database in this state, to the extent possible? Has anyone else experienced a similar scenario?
Sounds like ext4 or your SSD is messing with you. The only way you could wind up with your original state, 3 sub-DBs with 1 record each, based on the processing you described, is if the original DB pages representing that state were still recorded in the file. There's no way that those pages would not have already been reused by LMDB, after 2210167 transactions had been written to the DB. Much less the 4132159 transactions of your post-reboot file.
So either the SSD has remapped pages out from under you, or the ext4 journal has decided to give you back an older version of the file. In either case I doubt that any of your real data is still accessible thru any standard filesystem APIs.
If this filesystem is only being used to store LMDB data, you should use ext2 (or some other non-journaling filesystem of your choice). If all your txns are being committed synchronously, you should consider using a raw block device instead of a filesystem. (Code for this is experimental and not yet released. It's slower than using a filesystem, when using asynch transactions, but several times faster than any other filesystem for synch transactions.)
Thanks, Arto
Hi Howard,
On Mon, Mar 14, 2016 at 6:10 PM, Howard Chu hyc@symas.com wrote:
Sounds like ext4 or your SSD is messing with you. The only way you could wind up with your original state, 3 sub-DBs with 1 record each, based on the processing you described, is if the original DB pages representing that state were still recorded in the file. There's no way that those pages would not have already been reused by LMDB, after 2210167 transactions had been written to the DB. Much less the 4132159 transactions of your post-reboot file.
So either the SSD has remapped pages out from under you, or the ext4 journal has decided to give you back an older version of the file. In either case I doubt that any of your real data is still accessible thru any standard filesystem APIs.
Thank you for the very quick response. LMDB has been rock solid in this use for some time now, so I'd also be inclined to look to ext4 first, and will need to examine their laundry list of mount options [1] again. This particular file system was mounted 'noatime,nodev,nosuid,noexec', which means it should default to 'data=ordered' for the journaling.
In case anyone on the list has informed views regarding ext4 mount options to avoid or embrace when using LMDB, certainly would love to hear them...
As for the present problem, from further analysis what I suspect to be the reason those sub-DBs each contain their lone keys is that those three keys (one in each sub-DB) are the defaults inserted when the program initializes a new LMDB file to write to. Those three keys are inserted if the database is believed to be empty, i.e., if it has zero entries.
So possibly the post-reboot state of the DB file looked empty to LMDB, either for the main DB or for the three sub-DBs, and the program then proceeded to insert and commit those default keys. That's more plausible than a resurrection of the exact initial state (from months ago) where the original pages would indeed be long gone, as you noted.
I don't suppose there is much hope of finding the previous B+tree root(s) in the .mdb file and attempting to recover data from any still-reachable leaves?
If this filesystem is only being used to store LMDB data, you should use ext2 (or some other non-journaling filesystem of your choice). If all your txns are being committed synchronously, you should consider using a raw block device instead of a filesystem. (Code for this is experimental and not yet released. It's slower than using a filesystem, when using asynch transactions, but several times faster than any other filesystem for synch transactions.)
Thanks for those tips, will evaluate LMDB on ext2. Would also be most interested to help test the new code for LMDB on a raw block device. Might that perhaps be available in some Git branch going forward?
[1] https://www.kernel.org/doc/Documentation/filesystems/ext4.txt
Arto Bendiken wrote:
As for the present problem, from further analysis what I suspect to be the reason those sub-DBs each contain their lone keys is that those three keys (one in each sub-DB) are the defaults inserted when the program initializes a new LMDB file to write to. Those three keys are inserted if the database is believed to be empty, i.e., if it has zero entries.
So possibly the post-reboot state of the DB file looked empty to LMDB, either for the main DB or for the three sub-DBs, and the program then proceeded to insert and commit those default keys. That's more plausible than a resurrection of the exact initial state (from months ago) where the original pages would indeed be long gone, as you noted.
I don't suppose there is much hope of finding the previous B+tree root(s) in the .mdb file and attempting to recover data from any still-reachable leaves?
If as you suspect, these 3 entries are present because they were automatically reinserted, then no, you're not likely to be able to recover anything. If the DB file had not been modified from it's actual crashed state, it would have been possible to access the previous transaction's meta page which would very likely have pointed to complete intact data. But when you wrote to the DB, you overwrote the previous meta page.
Hi Howard,
On Mon, Mar 14, 2016 at 7:24 PM, Howard Chu hyc@symas.com wrote:
Arto Bendiken wrote:
As for the present problem, from further analysis what I suspect to be the reason those sub-DBs each contain their lone keys is that those three keys (one in each sub-DB) are the defaults inserted when the program initializes a new LMDB file to write to. Those three keys are inserted if the database is believed to be empty, i.e., if it has zero entries.
So possibly the post-reboot state of the DB file looked empty to LMDB, either for the main DB or for the three sub-DBs, and the program then proceeded to insert and commit those default keys. That's more plausible than a resurrection of the exact initial state (from months ago) where the original pages would indeed be long gone, as you noted.
I don't suppose there is much hope of finding the previous B+tree root(s) in the .mdb file and attempting to recover data from any still-reachable leaves?
If as you suspect, these 3 entries are present because they were automatically reinserted, then no, you're not likely to be able to recover anything. If the DB file had not been modified from it's actual crashed state, it would have been possible to access the previous transaction's meta page which would very likely have pointed to complete intact data. But when you wrote to the DB, you overwrote the previous meta page.
Thanks for confirming, I suspected as much. Bummer. I'll go ahead and proceed with restoring backups in this case, and will review the program logic to be more cautious in re-initializing what looks like (based on file size, last transaction ID, or other heuristics) a potentially-recoverable .mdb file instead of a genuinely new and empty one.
Thanks for you help, Arto
openldap-technical@openldap.org