[Issue 9291] New: Detection of corrupted database files

List overview All Threads
Download

newer

older

[Issue 9085] Test suite is failing

[Issue 8784] SIGBUS in...

openldap-its＠openldap.org

15 Jul 2020 15 Jul '20

12:04 p.m.

https://bugs.openldap.org/show_bug.cgi?id=9291

Issue ID: 9291 Summary: Detection of corrupted database files Product: LMDB Version: unspecified Hardware: All OS: All Status: UNCONFIRMED Severity: normal Priority: --- Component: liblmdb Assignee: bugs@openldap.org Reporter: markus@objectbox.io Target Milestone: ---

Let's assume we have to deal with a corrupted database for whatever reason (e.g. broken hardware or file system). Current behavior seems to be mostly undefined, which is understandable as it's not known what is broken (e.g. there are no checksums).

For example, I'm seeing a SIGBUS in mdb_page_touch because the cursor's top page (mp) is pointing to invalid memory (0x7f99cf004000) during a commit: mdb_page_touch mdb.c:2772 mdb_page_search mdb.c:6595 mdb_freelist_save mdb.c:3575 mdb_txn_commit mdb.c:4060

Cursor data at that point: mc_snum = 1, mc_top = 0; myki[0] = 0

A SIGBUS is troublesome as it crashes the process, and I wonder if there are other ways to detect such inconsistencies. If that be possible there could be user-specific handling in place. E.g. a user might start a new database file.

This issue was reported by our users, which also provided DB files: https://github.com/objectbox/objectbox-java/issues/859

I did not find a lot of consistency checks besides MDB_PAGE_NOTFOUND and MDB_CORRUPTED. Also, I think there's no current way to thoroughly check a DB file (e.g. like fsck for the DB file)?

My first idea other than checksums was to walk through the branch pages from the root and check if the referenced pages are within reasonable bounds. Also check the page content (e.g. nodes, flags). Additionally (optionally?), it should be possible to check that the key values are actually sorted.

So, it boils down to 3 points in summary: 1.) If there no way to check the DB file for consistency yet(?), which approach do you think would make sense? There might be two modes; one for a through check through all data, and a quick check that does not take long and could be e.g. done when opening the DB. Goal is to avoid process crashes and let users handle the situation. 2.) In general, is it possible to add more consistency checks in regular DB operations? 3.) Could the the particular situation (for which I provided the stack trace) detected (e.g. is myki[0] = 0 legal here?)

I'd be happy to provide a patch if you provide some direction where you want to take that.

-- You are receiving this mail because: You are on the CC list for the issue.

Show replies by date

openldap-its＠openldap.org

15 Jul 15 Jul

2:50 p.m.

New subject: [Issue 9291] Detection of corrupted database files

https://bugs.openldap.org/show_bug.cgi?id=9291

--- Comment #1 from Howard Chu hyc@openldap.org --- (In reply to Markus from comment #0)

...

Let's assume we have to deal with a corrupted database for whatever reason (e.g. broken hardware or file system). Current behavior seems to be mostly undefined, which is understandable as it's not known what is broken (e.g. there are no checksums).

For example, I'm seeing a SIGBUS in mdb_page_touch because the cursor's top page (mp) is pointing to invalid memory (0x7f99cf004000) during a commit: mdb_page_touch mdb.c:2772 mdb_page_search mdb.c:6595 mdb_freelist_save mdb.c:3575 mdb_txn_commit mdb.c:4060

Cursor data at that point: mc_snum = 1, mc_top = 0; myki[0] = 0

A SIGBUS is troublesome as it crashes the process, and I wonder if there are other ways to detect such inconsistencies. If that be possible there could be user-specific handling in place. E.g. a user might start a new database file.

This issue was reported by our users, which also provided DB files: https://github.com/objectbox/objectbox-java/issues/859

I did not find a lot of consistency checks besides MDB_PAGE_NOTFOUND and MDB_CORRUPTED. Also, I think there's no current way to thoroughly check a DB file (e.g. like fsck for the DB file)?

My first idea other than checksums was to walk through the branch pages from the root and check if the referenced pages are within reasonable bounds. Also check the page content (e.g. nodes, flags). Additionally (optionally?), it should be possible to check that the key values are actually sorted.

So, it boils down to 3 points in summary: 1.) If there no way to check the DB file for consistency yet(?), which approach do you think would make sense? There might be two modes; one for a through check through all data, and a quick check that does not take long and could be e.g. done when opening the DB. Goal is to avoid process crashes and let users handle the situation. 2.) In general, is it possible to add more consistency checks in regular DB operations? 3.) Could the the particular situation (for which I provided the stack trace) detected (e.g. is myki[0] = 0 legal here?)

I'd be happy to provide a patch if you provide some direction where you want to take that.

Hooks for per-page checksums (using user-provided checksum function) are already present in the 1.0 branch. It is currently only available to commercial licensed LMDB users.

re: (2) pervasive consistency checks will probably not be implemented since they would break LMDB's #1 reason to exist - fast read performance. (Although, we already do have a few such checks when compiled with higher values of MDB_DEBUG. You always have the option of compiling with your own debug checks as well.) Ultimately, it's not our responsibility to detect broken hardware or filesystems, it is the OS's.

-- You are receiving this mail because: You are on the CC list for the issue.

openldap-its＠openldap.org

16 Jul 16 Jul

4:15 p.m.

New subject: [Issue 9291] Detection of corrupted database files

https://bugs.openldap.org/show_bug.cgi?id=9291

--- Comment #2 from Markus markus@objectbox.io --- (In reply to Howard Chu from comment #1)

...

re: (2) pervasive consistency checks will probably not be implemented since they would break LMDB's #1 reason to exist - fast read performance.

Just to clarify, (2) would be about simple checks that should have no measurable performance impact, e.g. "page number is not zero" where appropriate, not the expensive fsck-like ones from (1).

Would you say that the described situation could be detected efficiently? (Basically my question (3)...)

...

(Although, we already do have a few such checks when compiled with higher values of MDB_DEBUG.

Thanks for that hint, I just found mdb_audit when looking for MDB_DEBUG. On a first glance it may be something similar to what I had in mind with (1). Will have a closer look at that and other MDB_DEBUG related code.

...

You always have the option of compiling with your own debug checks as well.) Ultimately, it's not our responsibility to detect broken hardware or filesystems, it is the OS's.

This could be read as you suggest forking for those cases? Again, to clarify, does that include checks that have no measurable performance impact?

-- You are receiving this mail because: You are on the CC list for the issue.

openldap-its＠openldap.org

7:54 p.m.

New subject: [Issue 9291] Detection of corrupted database files

https://bugs.openldap.org/show_bug.cgi?id=9291

--- Comment #3 from Markus markus@objectbox.io --- I digged deeper and found the actual cause for the SIGBUS and it which might be simple to detect. In short, a meta page points to a page beyond the file.

We could detect this in mdb_env_map and thus avoid the SIGBUS like this (first two lines exist and are for context):

env->me_metas[0] = METADATA(p); env->me_metas[1] = (MDB_meta *)((char *)env->me_metas[0] + env->me_psize);

mdb_size_t fsize; if(mdb_fsize(env->me_fd, &fsize) == MDB_SUCCESS && fsize) { pgno_t maxpgno = fsize / env->me_psize; if(env->me_metas[0]->mm_dbs[FREE_DBI].md_root > maxpgno || env->me_metas[0]->mm_dbs[MAIN_DBI].md_root > maxpgno || env->me_metas[1]->mm_dbs[FREE_DBI].md_root > maxpgno || env->me_metas[1]->mm_dbs[MAIN_DBI].md_root > maxpgno) { return MDB_PAGE_NOTFOUND; } }

Would that basic consistency check work for you?

Some background: file size was 2232320 (545 pages) and thus this new code caught the bad root at page 554. On odd thing was that I was able to see only 500 pages in the debugger's memory view; page 501 was already inaccessible. Didn't proceed there however...

-- You are receiving this mail because: You are on the CC list for the issue.

openldap-its＠openldap.org

8:30 p.m.

New subject: [Issue 9291] Detection of corrupted database files

https://bugs.openldap.org/show_bug.cgi?id=9291

--- Comment #4 from Howard Chu hyc@openldap.org --- If there are any references to pages beyond the end of the file, that implies that an fsync didn't complete. Are you running with NOSYNC or NOMETASYNC?

On Windows one of the common symptoms we see after a system crash is that trailing pages of the DB file are all-zero. Again, this indicates an fsync that didn't complete (or an NTFS bug).

I think your proposed change is too specific, since it could be any page, not just the meta pages, that references beyond the end of the file.

-- You are receiving this mail because: You are on the CC list for the issue.

openldap-its＠openldap.org

8:58 p.m.

New subject: [Issue 9291] Detection of corrupted database files

https://bugs.openldap.org/show_bug.cgi?id=9291

--- Comment #5 from Markus markus@objectbox.io --- We don't use NOSYNC or NOMETASYNC. The OS is Android, and it's likely that some device manufacture did something wrong in a specific model. However, I believe the specifics of how this happened are secondary as it will happen again in a different situation.

...

I think your proposed change is too specific, since it could be any page, not just the meta pages, that references beyond the end of the file.

Yes. There's a trade-off. Checking the roots can be done simply and in constant time. I'm aware that this is not a thorough check, which would require a tree traversal. I think in scenarios that are mostly appending new data, the chance would be quite good to catch a broken fsync. Of course it would not catch cases were pages are reused.

PS.: I added P_INVALID handling meanwhile.

-- You are receiving this mail because: You are on the CC list for the issue.

openldap-its＠openldap.org

17 Jul 17 Jul

11:24 a.m.

New subject: [Issue 9291] Detection of corrupted database files

https://bugs.openldap.org/show_bug.cgi?id=9291

--- Comment #6 from Markus markus@objectbox.io --- (In reply to Markus from comment #5)

...

Yes. There's a trade-off. Checking the roots can be done simply and in constant time. ...

Maybe it makes sense to level this trade-off out a bit. Let's say a branch page can reference up to around 40 page numbers. Checking the 4 root pages with a total of around 160 page numbers still seems quite doable without noticeable performance impact.

Going down another level might get noticeable with 160 pages (640 KB), but still doable. Actually, going down could be limited to the more recent meta tree and thus cutting it by half to 80 pages. It would give us up to 3200 additional page numbers we could check for consistency. The performance penalty is at least partially compensated by the fact that this "warms up" (caches) the top of the tree for future access.

What do you think?

-- You are receiving this mail because: You are on the CC list for the issue.

openldap-its＠openldap.org

11:40 a.m.

New subject: [Issue 9291] Detection of corrupted database files

https://bugs.openldap.org/show_bug.cgi?id=9291

--- Comment #7 from Markus markus@objectbox.io --- PS.: the math is different if you have really short keys, e.g. 4 bytes. In that case a node in the page should be around 18 bytes (?; from what I saw: 8 bytes meta info, 4 bytes key, 6 bytes page no). So let's say 225 page refs per page. Thus, going down 2 pages by one level may result in up to 450 pages or 1.8 MB. That might be too much already. Solvable by limiting the max number of pages to look for, e.g. first 100 pages.

-- You are receiving this mail because: You are on the CC list for the issue.

openldap-its＠openldap.org

21 Jul 21 Jul

3:37 p.m.

New subject: [Issue 9291] Detection of corrupted database files

https://bugs.openldap.org/show_bug.cgi?id=9291

--- Comment #8 from Markus markus@objectbox.io --- To move on with this quickly, we'll put the described checks in our next release (with some improvements). If you are interested in this, please let me know, I'm happy to contribute this.

Related question: the mentioned checksum are limited to the contents of a page, right? Thus, it does not help with verifying the consistency of the tree (within reasonable effort), correct? To my understanding that would require some bottom-up checksuming, e.g. checksum of a parent would also consider the checksums of its children (similar to a Merkle tree). Since this would have severe performance implications, I assume you don't consider to support that (ever)?

-- You are receiving this mail because: You are on the CC list for the issue.

openldap-its＠openldap.org

4:16 p.m.

New subject: [Issue 9291] Detection of corrupted database files

https://bugs.openldap.org/show_bug.cgi?id=9291

--- Comment #9 from Howard Chu hyc@openldap.org --- (In reply to Markus from comment #8)

...

To move on with this quickly, we'll put the described checks in our next release (with some improvements). If you are interested in this, please let me know, I'm happy to contribute this.

Feel free to attach your patch to this ticket. Please read https://openldap.org/devel/contributing.html first.

...

Related question: the mentioned checksum are limited to the contents of a page, right?

Right.

...

Thus, it does not help with verifying the consistency of the tree (within reasonable effort), correct? To my understanding that would require some bottom-up checksuming, e.g. checksum of a parent would also consider the checksums of its children (similar to a Merkle tree). Since this would have severe performance implications, I assume you don't consider to support that (ever)?

Not for production code, no.

-- You are receiving this mail because: You are on the CC list for the issue.

openldap-its＠openldap.org

7:49 p.m.

New subject: [Issue 9291] Detection of corrupted database files

https://bugs.openldap.org/show_bug.cgi?id=9291

Markus markus@objectbox.io changed:

What |Removed |Added ---------------------------------------------------------------------------- CC| |markus@objectbox.io

--- Comment #10 from Markus markus@objectbox.io --- Created attachment 744 --> https://bugs.openldap.org/attachment.cgi?id=744&action=edit First part: validating database root page numbers

The attached file is derived from OpenLDAP Software. All of the modifications to OpenLDAP Software represented in the following patch(es) were developed by ObjectBox Ltd. ObjectBox Ltd. has not assigned rights and/or interest in this work to any party. I, Markus Junginger am authorized by ObjectBox Ltd., my employer, to release this work under the following terms.

ObjectBox Ltd. hereby places the following modifications to OpenLDAP Software (and only these modifications) into the public domain. Hence, these modifications may be freely used and/or redistributed for any purpose with or without attribution and/or other notice.

-- You are receiving this mail because: You are on the CC list for the issue.

openldap-its＠openldap.org

7:57 p.m.

New subject: [Issue 9291] Detection of corrupted database files

https://bugs.openldap.org/show_bug.cgi?id=9291

Quanah Gibson-Mount quanah@openldap.org changed:

What |Removed |Added ---------------------------------------------------------------------------- Keywords| |has_patch

-- You are receiving this mail because: You are on the CC list for the issue.

openldap-its＠openldap.org

8:14 p.m.

New subject: [Issue 9291] Detection of corrupted database files

https://bugs.openldap.org/show_bug.cgi?id=9291

--- Comment #11 from Markus markus@objectbox.io --- For the second part, I think a new function that can be called optionally after the env has been opened, makes sense. Something like this draft:

/** * Checks the database for inconsistencies, e.g. page references beyond the * end of file. * @param max_page_number the maximum number of pages to check. * This limits the time spent, e.g. for a quick check on opening. * Pass zero to check all pages (the entire database). */ int mdb_check_pages(MDB_cursor *cursor, mdb_size_t max_page_number);

Please let me know if that looks good to you and if you have any naming preferences (not so sure here, e.g. mdb_page_ function prefix would also be fitting but is only used in mdb.c).

-- You are receiving this mail because: You are on the CC list for the issue.

openldap-its＠openldap.org

8:34 p.m.

New subject: [Issue 9291] Detection of corrupted database files

https://bugs.openldap.org/show_bug.cgi?id=9291

--- Comment #12 from Howard Chu hyc@openldap.org --- (In reply to Markus from comment #11)

...

For the second part, I think a new function that can be called optionally after the env has been opened, makes sense. Something like this draft:

/**
Checks the database for inconsistencies, e.g. page references beyond the

end of file.

@param max_page_number the maximum number of pages to check.
   This limits the time spent, e.g. for a quick check on opening.
   Pass zero to check all pages (the entire database).
*/ int mdb_check_pages(MDB_cursor *cursor, mdb_size_t max_page_number);

Please let me know if that looks good to you and if you have any naming preferences (not so sure here, e.g. mdb_page_ function prefix would also be fitting but is only used in mdb.c).

You could simply add a check of pgno in mdb_page_get(), and check every page reference as they occur.

-- You are receiving this mail because: You are on the CC list for the issue.

openldap-its＠openldap.org

8:42 p.m.

New subject: [Issue 9291] Detection of corrupted database files

https://bugs.openldap.org/show_bug.cgi?id=9291

--- Comment #13 from Markus markus@objectbox.io --- (In reply to Howard Chu from comment #12)

...

You could simply add a check of pgno in mdb_page_get(), and check every page reference as they occur.

How would we know the maximum valid page number (e.g. maxpgno = fsize / env->me_psize)? I assumed we need mdb_fsize, which we obviously don't want to call too often. Happy to learn about a better alternative.

-- You are receiving this mail because: You are on the CC list for the issue.

openldap-its＠openldap.org

9:04 p.m.

New subject: [Issue 9291] Detection of corrupted database files

https://bugs.openldap.org/show_bug.cgi?id=9291

--- Comment #14 from Howard Chu hyc@openldap.org --- (In reply to Markus from comment #13)

...

(In reply to Howard Chu from comment #12)

...
You could simply add a check of pgno in mdb_page_get(), and check every page reference as they occur.

How would we know the maximum valid page number (e.g. maxpgno = fsize / env->me_psize)? I assumed we need mdb_fsize, which we obviously don't want to call too often. Happy to learn about a better alternative.

Use mm_last_pg from the meta page.

-- You are receiving this mail because: You are on the CC list for the issue.

openldap-its＠openldap.org

9:10 p.m.

New subject: [Issue 9291] Detection of corrupted database files

https://bugs.openldap.org/show_bug.cgi?id=9291

--- Comment #15 from Howard Chu hyc@openldap.org --- (In reply to Howard Chu from comment #14)

...

(In reply to Markus from comment #13)

...
(In reply to Howard Chu from comment #12)

...
You could simply add a check of pgno in mdb_page_get(), and check every page reference as they occur.

How would we know the maximum valid page number (e.g. maxpgno = fsize / env->me_psize)? I assumed we need mdb_fsize, which we obviously don't want to call too often. Happy to learn about a better alternative.

Use mm_last_pg from the meta page.

Actually, that's already been checked for you, you only need to compare to mt_next_pgno. And mdb_page_get already checks it. Though perhaps the check could be done earlier in that function.

-- You are receiving this mail because: You are on the CC list for the issue.

openldap-its＠openldap.org

9:35 p.m.

New subject: [Issue 9291] Detection of corrupted database files

https://bugs.openldap.org/show_bug.cgi?id=9291

--- Comment #16 from Markus markus@objectbox.io --- (In reply to Howard Chu from comment #15)

...

...
Use mm_last_pg from the meta page.

Actually, that's already been checked for you, you only need to compare to mt_next_pgno. And mdb_page_get already checks it. Though perhaps the check could be done earlier in that function.

Right, checking this earlier seems a good idea as the check might be skipped otherwise by one of the gotos?

1) What kept me away from using mm_last_pg was its comment "Actually the file may be shorter if the freeDB lists the final pages." So I wasn't sure if and when there are cases when mm_last_pg points beyond the file(?).

2) mt_next_pgno itself should be checked against the file size on opening. I think, I'll update the submitted patch.

3) A preemptive check to peek into a few pages might still make sense on opening, e.g. at that point one could easily switch to the previous snapshot. Also there might be additional checks possible, e.g. pages must not point to a parent, page numbers in branch pages must be 2 or greater, ...

-- You are receiving this mail because: You are on the CC list for the issue.

openldap-its＠openldap.org

9:49 p.m.

New subject: [Issue 9291] Detection of corrupted database files

https://bugs.openldap.org/show_bug.cgi?id=9291

Markus markus@objectbox.io changed:

--- Comment #17 from Markus markus@objectbox.io --- Created attachment 745 --> https://bugs.openldap.org/attachment.cgi?id=745&action=edit First part: validating database root page numbers (Updated)

Also check mm_last_pg against maxpgno

-- You are receiving this mail because: You are on the CC list for the issue.

openldap-its＠openldap.org

10:47 p.m.

New subject: [Issue 9291] Detection of corrupted database files

https://bugs.openldap.org/show_bug.cgi?id=9291

--- Comment #18 from Howard Chu hyc@openldap.org --- (In reply to Markus from comment #16)

...

(In reply to Howard Chu from comment #15)

...
...
Use mm_last_pg from the meta page.

Actually, that's already been checked for you, you only need to compare to mt_next_pgno. And mdb_page_get already checks it. Though perhaps the check could be done earlier in that function.

Right, checking this earlier seems a good idea as the check might be skipped otherwise by one of the gotos?

What kept me away from using mm_last_pg was its comment "Actually the

file may be shorter if the freeDB lists the final pages." So I wasn't sure if and when there are cases when mm_last_pg points beyond the file(?).

It's possible for some pages to be dirtied at the end of the file, but then get deleted/unused again before the txn commits. If nothing else in the txn tries to reuse them, they'll go onto the freeDB. If this happens, then technically we've allocated those pages, but since they never got flushed to the filesystem, the file may be shorter.

...

mt_next_pgno itself should be checked against the file size on opening. I

think, I'll update the submitted patch.

-- You are receiving this mail because: You are on the CC list for the issue.

openldap-its＠openldap.org

22 Jul 22 Jul

10 a.m.

New subject: [Issue 9291] Detection of corrupted database files

https://bugs.openldap.org/show_bug.cgi?id=9291

--- Comment #19 from Markus markus@objectbox.io --- (In reply to Howard Chu from comment #18)

...

It's possible for some pages to be dirtied at the end of the file, but then get deleted/unused again before the txn commits.

I was able to reproduce this scenario and I don't see how mm_last_pg could be used in testing. Did I miss something? So, unless you have a better idea, I guess I'm back to my initially proposed patch.

-- You are receiving this mail because: You are on the CC list for the issue.

openldap-its＠openldap.org

10:42 a.m.

New subject: [Issue 9291] Detection of corrupted database files

https://bugs.openldap.org/show_bug.cgi?id=9291

Markus markus@objectbox.io changed:

-- You are receiving this mail because: You are on the CC list for the issue.

openldap-its＠openldap.org

10:42 a.m.

New subject: [Issue 9291] Detection of corrupted database files

https://bugs.openldap.org/show_bug.cgi?id=9291

Markus markus@objectbox.io changed:

-- You are receiving this mail because: You are on the CC list for the issue.

openldap-its＠openldap.org

28 Jul 28 Jul

8:27 p.m.

New subject: [Issue 9291] Detection of corrupted database files

https://bugs.openldap.org/show_bug.cgi?id=9291

--- Comment #20 from Markus markus@objectbox.io --- Created attachment 750 --> https://bugs.openldap.org/attachment.cgi?id=750&action=edit Alternative: validating database root page numbers and sums

The additional sum check may catch additional corruptions e.g. if the last (corrupted) commit added a significant amount of pages (or messed up page counts).

-- You are receiving this mail because: You are on the CC list for the issue.

openldap-its＠openldap.org

8:34 p.m.

New subject: [Issue 9291] Detection of corrupted database files

https://bugs.openldap.org/show_bug.cgi?id=9291

--- Comment #21 from Markus markus@objectbox.io --- Fyi, we'll do a release on our side using the patch of the last attachment.

Would be great to have a rough indication where you want to go with this (if at all).

-- You are receiving this mail because: You are on the CC list for the issue.

openldap-its＠openldap.org

30 Jul 30 Jul

6:54 p.m.

New subject: [Issue 9291] Detection of corrupted database files

https://bugs.openldap.org/show_bug.cgi?id=9291

--- Comment #22 from Howard Chu hyc@openldap.org --- (In reply to Markus from comment #21)

...

Fyi, we'll do a release on our side using the patch of the last attachment.

Would be great to have a rough indication where you want to go with this (if at all).

The patch looks OK, still checking to see if there's any case it can possibly break.

-- You are receiving this mail because: You are on the CC list for the issue.

1798

Age (days ago)

1813

Last active (days ago)

openldap-bugs@openldap.org

25 comments

1 participants

tags (0)

participants (1)

openldap-its＠openldap.org