https://bugs.openldap.org/show_bug.cgi?id=10114
Issue ID: 10114 Summary: Crash in mdb_copy with stale transactions(?) Product: LMDB Version: 0.9.30 Hardware: x86_64 OS: Linux Status: UNCONFIRMED Keywords: needs_review Severity: normal Priority: --- Component: liblmdb Assignee: bugs@openldap.org Reporter: zack+ldapbugs@owlfolio.org Target Milestone: ---
I have a LMDB database which is damaged in some way, I'm not sure exactly how, but the application that created it (KDE baloo_file) crashes on startup while trying to read it, with a backtrace pointing inside liblmdb...
#0 __memcpy_avx_unaligned_erms () at ../sysdeps/x86_64/multiarch/memmove-vec-unaligned-erms.S:837 #1 0x00007fbf1fa110b6 in mdb_page_touch (mc=mc@entry=0x7ffe8dc1adf0) at mdb.c:2502 #2 0x00007fbf1fa12c9c in mdb_cursor_touch (mc=mc@entry=0x7ffe8dc1adf0) at mdb.c:6563 #3 0x00007fbf1fa16228 in mdb_cursor_put (mc=mc@entry=0x7ffe8dc1adf0, key=key@entry=0x7ffe8dc1b1e0, data=data@entry=0x7ffe8dc1b1f0, flags=<optimized out>, flags@entry=0) at mdb.c:6697 #4 0x00007fbf1fa18d51 in mdb_put (txn=0x55986d167a70, dbi=<optimized out>, key=0x7ffe8dc1b1e0, data=0x7ffe8dc1b1f0, flags=0) at mdb.c:9076 #5 0x00007fbf1fcec44b in Baloo::PostingDB::put (this=this@entry=0x7ffe8dc1b2d0, term=..., list=...) at /usr/src/debug/kde-frameworks/baloo-5.110.0/baloo-5.110.0/src/engine/postingdb.cpp:66
If I try to mdb_dump the database (with nothing else trying to access it) I get
mdb_dump: index: MDB_BAD_TXN: Transaction must abort, has a child, or is invalid
That sounds like the sort of thing that ought to be cleared by mdb_copy -c, but instead that command also crashes inside __memcpy_avx_unaligned_erms. Backtrace:
#0 __memcpy_avx_unaligned_erms () at ../sysdeps/x86_64/multiarch/memmove-vec-unaligned-erms.S:839 #1 0x0000555555557e67 in mdb_env_cwalk (my=my@entry=0x7fffffffdbc0, pg=pg@entry=0x7fffffffd988, flags=0) at mdb.c:9264 #2 0x0000555555557fdf in mdb_env_cwalk (my=my@entry=0x7fffffffdbc0, pg=pg@entry=0x7fffffffdb90, flags=flags@entry=0) at mdb.c:9306 #3 0x0000555555558523 in mdb_env_copyfd1 (env=0x55555556a2a0, fd=<optimized out>) at mdb.c:9469 #4 0x00005555555588c9 in mdb_env_copy2 (env=0x55555556a2a0, path=<optimized out>, flags=flags@entry=1) at mdb.c:9623 #5 0x0000555555558ea6 in main (argc=3, argv=0x7fffffffe008) at mdb_copy.c:74
I tried to poke at the offending data structure a little but I didn't immediately see what was wrong...
(gdb) frame 1 #1 0x0000555555557e67 in mdb_env_cwalk (my=my@entry=0x7fffffffdbc0, pg=pg@entry=0x7fffffffd988, flags=0) at mdb.c:9264 9264 mdb_page_copy(leaf, mp, my->mc_env->me_psize);
(gdb) p mp $1 = (MDB_page *) 0x7fc008d32000 (gdb) p *mp $2 = {mp_p = {p_pgno = 0x0606060606060606, p_next = 0x0606060606060606}, mp_pad = 1542, mp_flags = 1542, mp_pb = {pb = {pb_lower = 1542, pb_upper = 18832}, pb_pages = 1234175494}, mp_ptrs = 0x7fc008d32010}
... except that those values for p_pgno and p_next don't look terribly plausible to me.
The database file is, unfortunately, much too large to attach here (2.3G uncompressed, 383M compressed with xz -17) and also it's, well, a full-text index of everything I have on my computer, so I'd be hesitant to attach it even if it fit. I can make it available for private download if that would be helpful. I'm also happy to do other experiments.
I realize that crashes caused by database corruption can be very difficult to avoid but I hope there might be some kind of easy defensive measure to take in this particular case which could at least allow the application to fail cleanly rather than crashing.