[Due to a typo in your e-mail address, the ITS system did not mail out
this message anywhere. So I'm CC'ing Howard directly, just in case.]
Wietse Venema writes:
I wrote a test driver that reliably causes LMDB to abort during a
simulated cache cleanup. This "exploit" produces the same result
on Linux and FreeBSD, 32-bit and 64-bit systems.
You're using an old read-only transaction which cannot coexist with:
- mdb_env_set_mapsize() which moves the map which a cursor in
the reader is using.
- several write-transactions + MDB_NOLOCK. The flag means the writers do
not know about the reader, so they reuse pages from the snapshot the
reader is using. The reader can survive while the metapages hold
on to its snapshot, i.e. 1 or 2 write commits (I think).
I don't know if this is a thinko in your program or miscommunication
between you and Howard about MDB_NOLOCK and mapsize changes. With the
current liblmdb, a map change should involve: Remember the reader's
current position (key), resize the map, renew the txn and cursor, and
reposition the cursor.
The test works if I (a) turn MDB_NOLOCK into MDB_NOTLS (I know that's
not what you want), and (b) detect map changes in mdb_cursor_get() and
update the cursor to match.
Old 'MDB_val's the reader fetched, are invalid after the mapsize change.
Also, remember that long-lived read-only transactions which write
transactions do know about, prevent them from reusing pages the reader
snapshot is using - resulting in further map growth.