https://bugs.openldap.org/show_bug.cgi?id=10180
Issue ID: 10180 Summary: mdb.c:2185: Assertion 'rc == 0' failed in mdb_page_dirty() Product: LMDB Version: unspecified Hardware: All OS: All Status: UNCONFIRMED Keywords: needs_review Severity: normal Priority: --- Component: liblmdb Assignee: bugs@openldap.org Reporter: dominik@greysector.net Target Milestone: ---
I'm experiencing a SIGABRT-induced crash in nheko, which uses lmdb for its database (~/.local/share/nheko/nheko/#hash#/{data,lock}.mdb).
The assertion triggering the SIGABRT seems to be happening here: https://git.openldap.org/openldap/openldap/-/blob/master/libraries/liblmdb/m...
The crash happens on every nheko startup since yesterday, so I assume something added to the database is triggering it.
Full backtrace from gdb: ... (gdb) run Starting program: /usr/bin/nheko [Thread debugging using libthread_db enabled] Using host libthread_db library "/lib64/libthread_db.so.1". [New Thread 0x7ffff04006c0 (LWP 8281)] [New Thread 0x7fffefa006c0 (LWP 8282)] [New Thread 0x7fffee2006c0 (LWP 8283)] [New Thread 0x7fffed8006c0 (LWP 8284)] [New Thread 0x7fffece006c0 (LWP 8285)] [New Thread 0x7fffe7e006c0 (LWP 8286)] [Thread 0x7fffe7e006c0 (LWP 8286) exited] [New Thread 0x7fffe7e006c0 (LWP 8287)] [New Thread 0x7fffe74006c0 (LWP 8288)] [Thread 0x7fffe7e006c0 (LWP 8287) exited] [Thread 0x7fffe74006c0 (LWP 8288) exited] [New Thread 0x7fffe74006c0 (LWP 8289)] [New Thread 0x7fffe7e006c0 (LWP 8290)] [Thread 0x7fffe74006c0 (LWP 8289) exited] [Thread 0x7fffe7e006c0 (LWP 8290) exited] [New Thread 0x7fffe7e006c0 (LWP 8292)] [New Thread 0x7fffe74006c0 (LWP 8293)] [New Thread 0x7fffe6a006c0 (LWP 8294)] [New Thread 0x7fffe60006c0 (LWP 8295)] [New Thread 0x7fffe56006c0 (LWP 8296)] [New Thread 0x7fffe4c006c0 (LWP 8297)] [New Thread 0x7fffdfe006c0 (LWP 8298)] [New Thread 0x7fffdf4006c0 (LWP 8299)] [New Thread 0x7fffd7e006c0 (LWP 8300)] [New Thread 0x7fffd74006c0 (LWP 8301)] [New Thread 0x7fffd6a006c0 (LWP 8302)] [New Thread 0x7fffd60006c0 (LWP 8303)] [Thread 0x7fffd60006c0 (LWP 8303) exited] [Thread 0x7fffd6a006c0 (LWP 8302) exited] [New Thread 0x7fffd6a006c0 (LWP 8304)] [2024-02-16 11:52:33.040] [ui] [info] Restoring window size 1424x885 [2024-02-16 11:52:33.051] [ui] [info] WebRTC: initialised GStreamer 1.22.9 [New Thread 0x7fffd60006c0 (LWP 8305)] [New Thread 0x7fffd52006c0 (LWP 8306)] [New Thread 0x7fffcfe006c0 (LWP 8307)] [New Thread 0x7fffcf4006c0 (LWP 8308)] [New Thread 0x7fffcea006c0 (LWP 8309)] [Thread 0x7fffcea006c0 (LWP 8309) exited] [New Thread 0x7fffcea006c0 (LWP 8310)] [Thread 0x7fffcea006c0 (LWP 8310) exited] [2024-02-16 11:52:33.342] [ui] [info] Loaded jdenticon plugin. [New Thread 0x7fffcea006c0 (LWP 8311)] [New Thread 0x7fffce0006c0 (LWP 8312)] [Thread 0x7fffce0006c0 (LWP 8312) exited] [Thread 0x7fffcea006c0 (LWP 8311) exited] [2024-02-16 11:52:33.557] [ui] [info] starting nheko 0.11.3 [New Thread 0x7fffcea006c0 (LWP 8313)] [2024-02-16 11:52:33.558] [ui] [info] User already signed in, showing chat page [2024-02-16 11:52:33.559] [ui] [info] Switching to chat page [New Thread 0x7fffce0006c0 (LWP 8314)] [2024-02-16 11:52:33.604] [qml] [warning] qrc:/qml/TopBar.qml:234:13: QML AbstractButton: Binding loop detected for property "implicitWidth" (qrc:/qml/TopBar.qml:234, ) [2024-02-16 11:52:33.604] [qml] [warning] qrc:/qml/TopBar.qml:241:30: QML EncryptionIndicator: Binding loop detected for property "sourceSize.height" (qrc:/qml/TopBar.qml:241, ) [2024-02-16 11:52:33.622] [ui] [info] Unity service available: true [New Thread 0x7fffcca006c0 (LWP 8315)] [New Thread 0x7fffbfe006c0 (LWP 8316)] [New Thread 0x7fffbf4006c0 (LWP 8317)] [2024-02-16 11:52:33.629] [qml] [warning] qrc:/qml/ChatPage.qml:105:17: QML RoomList: Binding loop detected for property "implicitWidth" (qrc:/qml/ChatPage.qml:105, ) [2024-02-16 11:52:33.629] [qml] [warning] qrc:/qml/ChatPage.qml:105:17: QML RoomList: Binding loop detected for property "implicitWidth" (qrc:/qml/ChatPage.qml:105, ) [New Thread 0x7fffbd2006c0 (LWP 8318)] [2024-02-16 11:52:33.719] [db] [info] database ready [2024-02-16 11:52:33.720] [db] [info] restoring state from cache [2024-02-16 11:52:33.729] [db] [info] Restored 150 rooms from cache [2024-02-16 11:52:33.770] [db] [info] Invalidating self verification status [New Thread 0x7fffb7e006c0 (LWP 8319)] [2024-02-16 11:52:33.775] [crypto] [info] ed25519 : <redacted> [2024-02-16 11:52:33.775] [crypto] [info] curve25519: <redacted> mdb.c:2185: Assertion 'rc == 0' failed in mdb_page_dirty() [New Thread 0x7fffb74006c0 (LWP 8320)]
Thread 1 "nheko" received signal SIGABRT, Aborted. 0x00007ffff4eae834 in __pthread_kill_implementation () from /lib64/libc.so.6 (gdb) where #0 0x00007ffff4eae834 in __pthread_kill_implementation () from /lib64/libc.so.6 #1 0x00007ffff4e5c8ee in raise () from /lib64/libc.so.6 #2 0x00007ffff4e448ff in abort () from /lib64/libc.so.6 #3 0x00007ffff780f56b in mdb_assert_fail.constprop.0 (env=0x5555572b3940, expr_txt=<optimized out>, func=<optimized out>, line=<optimized out>, file=0x7ffff781c9d0 "mdb.c") at /usr/src/debug/lmdb-0.9.32-1.fc39.x86_64/libraries/liblmdb/mdb.c:1588 #4 0x00007ffff780da59 in mdb_page_dirty (txn=<optimized out>, mp=<optimized out>) at /usr/src/debug/lmdb-0.9.32-1.fc39.x86_64/libraries/liblmdb/mdb.c:2172 #5 mdb_page_dirty (txn=0x555557a50220, mp=<optimized out>) at /usr/src/debug/lmdb-0.9.32-1.fc39.x86_64/libraries/liblmdb/mdb.c:2172 #6 0x00007ffff781be5e in mdb_page_alloc.isra.0 (num=1, mp=0x7fffffffb768, mc=<optimized out>) at /usr/src/debug/lmdb-0.9.32-1.fc39.x86_64/libraries/liblmdb/mdb.c:2366 #7 0x00007ffff7812e52 in mdb_page_touch (mc=mc@entry=0x7fffffffbcb0) at /usr/src/debug/lmdb-0.9.32-1.fc39.x86_64/libraries/liblmdb/mdb.c:2486 #8 0x00007ffff7814ac7 in mdb_cursor_touch (mc=mc@entry=0x7fffffffbcb0) at /usr/src/debug/lmdb-0.9.32-1.fc39.x86_64/libraries/liblmdb/mdb.c:6602 #9 0x00007ffff7817879 in _mdb_cursor_put (mc=mc@entry=0x7fffffffbcb0, key=key@entry=0x7fffffffc080, data=data@entry=0x7fffffffc090, flags=<optimized out>, flags@entry=0) at /usr/src/debug/lmdb-0.9.32-1.fc39.x86_64/libraries/liblmdb/mdb.c:6736 #10 0x00007ffff78188ee in mdb_put (txn=0x555557a50220, dbi=17, key=0x7fffffffc080, data=0x7fffffffc090, flags=0) at /usr/src/debug/lmdb-0.9.32-1.fc39.x86_64/libraries/liblmdb/mdb.c:9150 #11 0x00005555559f41c2 in lmdb::dbi_put (flags=0, data=0x7fffffffc090, key=0x7fffffffc080, dbi=<optimized out>, txn=<optimized out>) at /usr/include/lmdb++.h:787 #12 lmdb::dbi::put(MDB_txn*, std::basic_string_view<char, std::char_traits<char> >, std::basic_string_view<char, std::char_traits<char>
, unsigned int) [clone .constprop.0] [clone .isra.0] (txn=<optimized out>,
key=..., data=..., flags=0, this=<optimized out>) at /usr/include/lmdb++.h:1413 #13 0x00005555559521e5 in Cache::markUserKeysOutOfDate (this=this@entry=0x555557264f40, txn=..., db=..., user_ids=std::vector of length 1, capacity 1 = {...}, sync_token="s117629150_1_50402_53153589_3038507_9889_1030948_33315610_0_2131") at /usr/include/lmdb++.h:1187 #14 0x00005555559524d5 in Cache::markUserKeysOutOfDate (this=0x555557264f40, user_ids=std::vector of length 1, capacity 1 = {...}) at /usr/src/debug/nheko-0.11.3-5.fc39.x86_64/src/Cache.cpp:4580 #15 0x00005555559333ee in operator() (__closure=<optimized out>) at /usr/src/debug/nheko-0.11.3-5.fc39.x86_64/src/encryption/SelfVerificationStatus.cpp:32 #16 QtPrivate::FunctorCall<QtPrivate::IndexesList<>, QtPrivate::List<>, void, SelfVerificationStatus::SelfVerificationStatus(QObject*)::<lambda()> >::call (arg=<optimized out>, f=...) at /usr/include/qt5/QtCore/qobjectdefs_impl.h:146 #17 QtPrivate::Functor<SelfVerificationStatus::SelfVerificationStatus(QObject*)::<lambda()>, 0>::call<QtPrivate::List<>, void> (arg=<optimized out>, f=...) at /usr/include/qt5/QtCore/qobjectdefs_impl.h:256 #18 QtPrivate::QFunctorSlotObject<SelfVerificationStatus::SelfVerificationStatus(QObject*)::<lambda()>, 0, QtPrivate::List<>, void>::impl(int, QtPrivate::QSlotObjectBase *, QObject *, void **, bool *) (which=<optimized out>, this_=<optimized out>, r=<optimized out>, a=<optimized out>, ret=<optimized out>) at /usr/include/qt5/QtCore/qobjectdefs_impl.h:443 #19 0x00007ffff58e9151 in QtPrivate::QSlotObjectBase::call (a=0x7fffffffc970, r=<optimized out>, this=0x5555567a6ff0) at ../../include/QtCore/../../src/corelib/kernel/qobjectdefs_impl.h:398 #20 doActivate<false> (sender=0x555556030bc0, signal_index=5, argv=0x7fffffffc970) at kernel/qobject.cpp:3925 #21 0x00007ffff58e4077 in QMetaObject::activate (sender=sender@entry=0x555556030bc0, m=<optimized out>, local_signal_index=local_signal_index@entry=2, argv=argv@entry=0x0) at kernel/qobject.cpp:3985 #22 0x0000555555977046 in ChatPage::contentLoaded (this=0x555556030bc0) at /usr/src/debug/nheko-0.11.3-5.fc39.x86_64/redhat-linux-build/nheko_autogen/UVLADIE3JM/moc_ChatPage.cpp:820 #23 ChatPage::loadStateFromCache (this=0x555556030bc0) at /usr/src/debug/nheko-0.11.3-5.fc39.x86_64/src/ChatPage.cpp:600 #24 0x0000555555977919 in ChatPage::bootstrap(QString, QString, QString)::{lambda()#1}::operator()() const [clone .lto_priv.0] () at /usr/src/debug/nheko-0.11.3-5.fc39.x86_64/src/ChatPage.cpp:497 #25 0x00007ffff58e9151 in QtPrivate::QSlotObjectBase::call (a=0x7fffffffcf00, r=<optimized out>, this=0x5555572b4c50) at ../../include/QtCore/../../src/corelib/kernel/qobjectdefs_impl.h:398 #26 doActivate<false> (sender=0x555557264f40, signal_index=10, argv=0x7fffffffcf00) at kernel/qobject.cpp:3925 #27 0x00007ffff58e4077 in QMetaObject::activate (sender=sender@entry=0x555557264f40, m=<optimized out>, local_signal_index=local_signal_index@entry=7, argv=argv@entry=0x0) at kernel/qobject.cpp:3985 #28 0x000055555593873e in Cache::databaseReady (this=0x555557264f40) at /usr/src/debug/nheko-0.11.3-5.fc39.x86_64/redhat-linux-build/nheko_autogen/UVLADIE3JM/moc_Cache_p.cpp:275 #29 Cache::loadSecretsFromStore(std::vector<std::pair<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >, bool>, std::allocator<std::pair<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >, bool> > >, std::function<void (std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char>
const&, bool, std::__cxx11::basic_string<char, std::char_traits<char>,
std::allocator<char> > const&)>, bool) (this=0x555557264f40, toLoad=std::vector of length 0, capacity 0, callback=..., databaseReadyOnFinished=true) at /usr/src/debug/nheko-0.11.3-5.fc39.x86_64/src/Cache.cpp:393 #30 0x000055555596e7f1 in operator() (__closure=<optimized out>) at /usr/src/debug/nheko-0.11.3-5.fc39.x86_64/src/Cache.cpp:457 #31 QtPrivate::FunctorCall<QtPrivate::IndexesList<>, QtPrivate::List<>, void, Cache::loadSecretsFromStore(std::vector<std::pair<std::__cxx11::basic_string<char>, bool> >, std::function<void(const std::__cxx11::basic_string<char>&, bool, const std::__cxx11::basic_string<char>&)>, bool)::<lambda(QKeychain::Job*)> mutable::<lambda()> >::call (arg=<optimized out>, f=...) at /usr/include/qt5/QtCore/qobjectdefs_impl.h:146 #32 QtPrivate::Functor<Cache::loadSecretsFromStore(std::vector<std::pair<std::__cxx11::basic_string<char>, bool> >, std::function<void(const std::__cxx11::basic_string<char>&, bool, const std::__cxx11::basic_string<char>&)>, bool)::<lambda(QKeychain::Job*)> mutable::<lambda()>, 0>::call<QtPrivate::List<>, void> (arg=<optimized out>, f=...) at /usr/include/qt5/QtCore/qobjectdefs_impl.h:256 #33 QtPrivate::QFunctorSlotObject<Cache::loadSecretsFromStore(std::vector<std::pair<std::__cxx11::basic_string<char>, bool> >, std::function<void(const std::__cxx11::basic_string<char>&, bool, const std::__cxx11::basic_string<char>&)>, bool)::<lambda(QKeychain::Job*)> mutable::<lambda()>, 0, QtPrivate::List<>, void>::impl(int, QtPrivate::QSlotObjectBase *, QObject *, void **, bool *) (which=<optimized out>, this_=<optimized out>, r=<optimized out>, a=<optimized out>, ret=<optimized out>) at /usr/include/qt5/QtCore/qobjectdefs_impl.h:443 #34 0x00007ffff58df9fb in QObject::event (this=0x555557264f40, e=0x7fffe8008f30) at kernel/qobject.cpp:1347 #35 0x00007ffff65aeb95 in QApplicationPrivate::notify_helper (this=<optimized out>, receiver=0x555557264f40, e=0x7fffe8008f30) at kernel/qapplication.cpp:3640 #36 0x00007ffff58b4e78 in QCoreApplication::notifyInternal2 (receiver=0x555557264f40, event=0x7fffe8008f30) at kernel/qcoreapplication.cpp:1064 #37 0x00007ffff58b5092 in QCoreApplication::sendEvent (receiver=<optimized out>, event=<optimized out>) at kernel/qcoreapplication.cpp:1462 #38 0x00007ffff58b8325 in QCoreApplicationPrivate::sendPostedEvents (receiver=0x0, event_type=0, data=0x555555d6ef30) at kernel/qcoreapplication.cpp:1821 #39 0x00007ffff58b85dd in QCoreApplication::sendPostedEvents (receiver=<optimized out>, event_type=<optimized out>) at kernel/qcoreapplication.cpp:1680 #40 0x00007ffff59078cf in postEventSourceDispatch (s=0x555556019530) at kernel/qeventdispatcher_glib.cpp:277 #41 0x00007ffff5511e5c in g_main_dispatch (context=0x7fffe8000ec0) at ../glib/gmain.c:3476 #42 g_main_context_dispatch_unlocked (context=0x7fffe8000ec0) at ../glib/gmain.c:4284 #43 0x00007ffff556cf18 in g_main_context_iterate_unlocked.isra.0 (context=context@entry=0x7fffe8000ec0, block=block@entry=1, dispatch=dispatch@entry=1, self=<optimized out>) at ../glib/gmain.c:4349 #44 0x00007ffff550fad3 in g_main_context_iteration (context=0x7fffe8000ec0, may_block=1) at ../glib/gmain.c:4414 #45 0x00007ffff59073b9 in QEventDispatcherGlib::processEvents (this=0x555555f57ae0, flags=...) at kernel/qeventdispatcher_glib.cpp:423 #46 0x00007ffff58b383b in QEventLoop::exec (this=this@entry=0x7fffffffd640, flags=..., flags@entry=...) at ../../include/QtCore/../../src/corelib/global/qflags.h:69 #47 0x00007ffff58bbacb in QCoreApplication::exec () at ../../include/QtCore/../../src/corelib/global/qflags.h:121 #48 0x00007ffff5d60ead in QGuiApplication::exec () at kernel/qguiapplication.cpp:1863 #49 0x00007ffff65aeb09 in QApplication::exec () at kernel/qapplication.cpp:2832 #50 0x000055555570d10d in main (argc=<optimized out>, argv=<optimized out>) at /usr/src/debug/nheko-0.11.3-5.fc39.x86_64/src/main.cpp:405 (gdb)
Looks like this was reported to nheko already, but they don't know what's causing it: https://github.com/Nheko-Reborn/nheko/issues/1303 .
https://bugs.openldap.org/show_bug.cgi?id=10180
--- Comment #1 from Dominik Mierzejewski dominik@greysector.net --- Ah, I forgot to add: this is lmdb-0.9.32 (and nheko 0.11.3) on Fedora 39 x86_64.
https://bugs.openldap.org/show_bug.cgi?id=10180
--- Comment #2 from Howard Chu hyc@openldap.org --- How big is that DB, and are you willing to share it with us to examine?
Also, can you provide the output of `mdb_stat -efa` on that DB?
https://bugs.openldap.org/show_bug.cgi?id=10180
--- Comment #3 from Dominik Mierzejewski dominik@greysector.net --- (In reply to Howard Chu from comment #2)
How big is that DB, and are you willing to share it with us to examine?
It's 114MB uncompressed or 22MB compressed with zstd -17. I could provide it to you directly.
Also, can you provide the output of `mdb_stat -efa` on that DB?
I don't want to post it here as it has a lot of semi-private metadata. Any alternative means? GPG-encrypted e-mail perhaps? If yes, please point me to your public GPG key.
https://bugs.openldap.org/show_bug.cgi?id=10180
Quanah Gibson-Mount quanah@openldap.org changed:
What |Removed |Added ---------------------------------------------------------------------------- Keywords|needs_review |
--- Comment #4 from Quanah Gibson-Mount quanah@openldap.org --- GPG key provided, waiting on user to provide DB to Howard
https://bugs.openldap.org/show_bug.cgi?id=10180
--- Comment #5 from Dominik Mierzejewski dominik@greysector.net --- Unfortunately, openldap.org MX is rejecting my message, saying it's over 10MB limit. Please provide alternate means of sending you the DB.
https://bugs.openldap.org/show_bug.cgi?id=10180
--- Comment #6 from Howard Chu hyc@openldap.org --- After getting a copy of the DB file, I found
mdb_stat -efff shows that there are a couple of freelist entries with duplicate page IDs. Write txns using those freelist entries would certainly cause corruption.
mdb_dump -a seems to have no problem dumping the current data in the DB. As such, you could safely dump/load to recreate the DB without the freelist errors.
The last transaction ID was 69771 and the two bad freelist entries are from txn 69709 and 69710, respectively. Probably too far into the past to know what it was doing at the time.
If you compile LMDB with -DMDB_DEBUG=3 it will do extensive auditing of the freelist after each transaction. Possibly we can use that to track down when the error occurred, on a fresh run.
No idea how long it would take to reproduce this or what operations are needed to reproduce it.