https://bugs.openldap.org/show_bug.cgi?id=9291
Issue ID: 9291 Summary: Detection of corrupted database files Product: LMDB Version: unspecified Hardware: All OS: All Status: UNCONFIRMED Severity: normal Priority: --- Component: liblmdb Assignee: bugs@openldap.org Reporter: markus@objectbox.io Target Milestone: ---
Let's assume we have to deal with a corrupted database for whatever reason (e.g. broken hardware or file system). Current behavior seems to be mostly undefined, which is understandable as it's not known what is broken (e.g. there are no checksums).
For example, I'm seeing a SIGBUS in mdb_page_touch because the cursor's top page (mp) is pointing to invalid memory (0x7f99cf004000) during a commit: mdb_page_touch mdb.c:2772 mdb_page_search mdb.c:6595 mdb_freelist_save mdb.c:3575 mdb_txn_commit mdb.c:4060
Cursor data at that point: mc_snum = 1, mc_top = 0; myki[0] = 0
A SIGBUS is troublesome as it crashes the process, and I wonder if there are other ways to detect such inconsistencies. If that be possible there could be user-specific handling in place. E.g. a user might start a new database file.
This issue was reported by our users, which also provided DB files: https://github.com/objectbox/objectbox-java/issues/859
I did not find a lot of consistency checks besides MDB_PAGE_NOTFOUND and MDB_CORRUPTED. Also, I think there's no current way to thoroughly check a DB file (e.g. like fsck for the DB file)?
My first idea other than checksums was to walk through the branch pages from the root and check if the referenced pages are within reasonable bounds. Also check the page content (e.g. nodes, flags). Additionally (optionally?), it should be possible to check that the key values are actually sorted.
So, it boils down to 3 points in summary: 1.) If there no way to check the DB file for consistency yet(?), which approach do you think would make sense? There might be two modes; one for a through check through all data, and a quick check that does not take long and could be e.g. done when opening the DB. Goal is to avoid process crashes and let users handle the situation. 2.) In general, is it possible to add more consistency checks in regular DB operations? 3.) Could the the particular situation (for which I provided the stack trace) detected (e.g. is myki[0] = 0 legal here?)
I'd be happy to provide a patch if you provide some direction where you want to take that.