[Due to a typo in your e-mail address, the ITS system did not mail out this message anywhere. So I'm CC'ing Howard directly, just in case.]
Wietse Venema writes:
I wrote a test driver that reliably causes LMDB to abort during a simulated cache cleanup. This "exploit" produces the same result on Linux and FreeBSD, 32-bit and 64-bit systems.
You're using an old read-only transaction which cannot coexist with: - mdb_env_set_mapsize() which moves the map which a cursor in the reader is using. - several write-transactions + MDB_NOLOCK. The flag means the writers do not know about the reader, so they reuse pages from the snapshot the reader is using. The reader can survive while the metapages hold on to its snapshot, i.e. 1 or 2 write commits (I think).
I don't know if this is a thinko in your program or miscommunication between you and Howard about MDB_NOLOCK and mapsize changes. With the current liblmdb, a map change should involve: Remember the reader's current position (key), resize the map, renew the txn and cursor, and reposition the cursor.
The test works if I (a) turn MDB_NOLOCK into MDB_NOTLS (I know that's not what you want), and (b) detect map changes in mdb_cursor_get() and update the cursor to match. Old 'MDB_val's the reader fetched, are invalid after the mapsize change. Also, remember that long-lived read-only transactions which write transactions do know about, prevent them from reusing pages the reader snapshot is using - resulting in further map growth.