On Mon, 2017-10-16 at 13:58 +0200, Hallvard Breien Furuseth wrote:
On 16. okt. 2017 12:51, Howard Chu wrote:
timur.kristof@gmail.com wrote:
I have an app that uses LMDB, and I've experienced an interesting issue: when trying to delete a certain item with mdb_cursor_del, it crashed with the following backtrace: https://pastebin.com/7p9wtk j9
Weird backtrace. It says mdb_page_dirty(), which is small, streches over 300+ lines (frames #3-#4). And mdb_page_alloc() alone has no hex address for prefix. Maybe miscompilation, two liblmdb libraries linked into the same executable, or something like that? Or some wild pointer write or whatever messed things up.
Not sure what was going on there, maybe -O3 messed it up. Still, the issue does appear with -O0 too and here is a backtrace with -O0: https://pastebin.com/SfeMMEPH
Most likely the dirty list is too big, which means you're trying to do too much in a single transaction.
Shouldn't happen though. The txn should have failed earlier with MDB_TXN_FULL.
Which also shouldn't happen since LMDB should have spilled enough pages to make room - unless you have hundreds of cursors at modified pages so LMDB can't spill enough.
But we should probably test LMDB with impractically tight dirty-list arrays (i.e. a very small MDB_IDL_UM_MAX), so LMDB keeps running into such cases.
I've taken a look at the value of rc (see my reply to Howard), and it seems to me that Леонид Юрьев's assessment may be correct here. rc is -1 which indicates that the page (even though newly allocated, maybe a reused page?) is already on the txn's dirty pages list.
- Timur