https://bugs.openldap.org/show_bug.cgi?id=10114
Issue ID: 10114 Summary: Crash in mdb_copy with stale transactions(?) Product: LMDB Version: 0.9.30 Hardware: x86_64 OS: Linux Status: UNCONFIRMED Keywords: needs_review Severity: normal Priority: --- Component: liblmdb Assignee: bugs@openldap.org Reporter: zack+ldapbugs@owlfolio.org Target Milestone: ---
I have a LMDB database which is damaged in some way, I'm not sure exactly how, but the application that created it (KDE baloo_file) crashes on startup while trying to read it, with a backtrace pointing inside liblmdb...
#0 __memcpy_avx_unaligned_erms () at ../sysdeps/x86_64/multiarch/memmove-vec-unaligned-erms.S:837 #1 0x00007fbf1fa110b6 in mdb_page_touch (mc=mc@entry=0x7ffe8dc1adf0) at mdb.c:2502 #2 0x00007fbf1fa12c9c in mdb_cursor_touch (mc=mc@entry=0x7ffe8dc1adf0) at mdb.c:6563 #3 0x00007fbf1fa16228 in mdb_cursor_put (mc=mc@entry=0x7ffe8dc1adf0, key=key@entry=0x7ffe8dc1b1e0, data=data@entry=0x7ffe8dc1b1f0, flags=<optimized out>, flags@entry=0) at mdb.c:6697 #4 0x00007fbf1fa18d51 in mdb_put (txn=0x55986d167a70, dbi=<optimized out>, key=0x7ffe8dc1b1e0, data=0x7ffe8dc1b1f0, flags=0) at mdb.c:9076 #5 0x00007fbf1fcec44b in Baloo::PostingDB::put (this=this@entry=0x7ffe8dc1b2d0, term=..., list=...) at /usr/src/debug/kde-frameworks/baloo-5.110.0/baloo-5.110.0/src/engine/postingdb.cpp:66
If I try to mdb_dump the database (with nothing else trying to access it) I get
mdb_dump: index: MDB_BAD_TXN: Transaction must abort, has a child, or is invalid
That sounds like the sort of thing that ought to be cleared by mdb_copy -c, but instead that command also crashes inside __memcpy_avx_unaligned_erms. Backtrace:
#0 __memcpy_avx_unaligned_erms () at ../sysdeps/x86_64/multiarch/memmove-vec-unaligned-erms.S:839 #1 0x0000555555557e67 in mdb_env_cwalk (my=my@entry=0x7fffffffdbc0, pg=pg@entry=0x7fffffffd988, flags=0) at mdb.c:9264 #2 0x0000555555557fdf in mdb_env_cwalk (my=my@entry=0x7fffffffdbc0, pg=pg@entry=0x7fffffffdb90, flags=flags@entry=0) at mdb.c:9306 #3 0x0000555555558523 in mdb_env_copyfd1 (env=0x55555556a2a0, fd=<optimized out>) at mdb.c:9469 #4 0x00005555555588c9 in mdb_env_copy2 (env=0x55555556a2a0, path=<optimized out>, flags=flags@entry=1) at mdb.c:9623 #5 0x0000555555558ea6 in main (argc=3, argv=0x7fffffffe008) at mdb_copy.c:74
I tried to poke at the offending data structure a little but I didn't immediately see what was wrong...
(gdb) frame 1 #1 0x0000555555557e67 in mdb_env_cwalk (my=my@entry=0x7fffffffdbc0, pg=pg@entry=0x7fffffffd988, flags=0) at mdb.c:9264 9264 mdb_page_copy(leaf, mp, my->mc_env->me_psize);
(gdb) p mp $1 = (MDB_page *) 0x7fc008d32000 (gdb) p *mp $2 = {mp_p = {p_pgno = 0x0606060606060606, p_next = 0x0606060606060606}, mp_pad = 1542, mp_flags = 1542, mp_pb = {pb = {pb_lower = 1542, pb_upper = 18832}, pb_pages = 1234175494}, mp_ptrs = 0x7fc008d32010}
... except that those values for p_pgno and p_next don't look terribly plausible to me.
The database file is, unfortunately, much too large to attach here (2.3G uncompressed, 383M compressed with xz -17) and also it's, well, a full-text index of everything I have on my computer, so I'd be hesitant to attach it even if it fit. I can make it available for private download if that would be helpful. I'm also happy to do other experiments.
I realize that crashes caused by database corruption can be very difficult to avoid but I hope there might be some kind of easy defensive measure to take in this particular case which could at least allow the application to fail cleanly rather than crashing.
https://bugs.openldap.org/show_bug.cgi?id=10114
--- Comment #1 from Howard Chu hyc@openldap.org --- Crashes in Baloo have been reported before. Unfortunately we also need a debug build of Baloo to really know what's going on. My previous attempts to build one all failed, giving me a baloo binary that just SEGV'd on startup.
Fyi, there is no evidence that LMDB has any relevant bugs causing self corruption. My suspicion is that Baloo has tried to use a single write transaction from more than one thread but we need to be able to debug that to know for sure.
If you're willing to send me your DB file privately I can try to dig at it and see what's wrong inside. You'd actually be the first person who's reported this Baloo issue to provide one.
https://bugs.openldap.org/show_bug.cgi?id=10114
--- Comment #2 from Howard Chu hyc@openldap.org --- Some relevant history https://bugs.kde.org/show_bug.cgi?id=389848
https://bugs.openldap.org/show_bug.cgi?id=10114
Howard Chu hyc@openldap.org changed:
What |Removed |Added ---------------------------------------------------------------------------- Status|UNCONFIRMED |RESOLVED Resolution|--- |DUPLICATE
--- Comment #3 from Howard Chu hyc@openldap.org ---
*** This issue has been marked as a duplicate of issue 9378 ***
https://bugs.openldap.org/show_bug.cgi?id=10114
Quanah Gibson-Mount quanah@openldap.org changed:
What |Removed |Added ---------------------------------------------------------------------------- Keywords|needs_review | Status|RESOLVED |VERIFIED