Full_Name: Hallvard B Furuseth Version: LMDB_0.9.8 OS: URL: Submission from: (NULL) (81.191.45.35) Submitted by: hallvard
mdb_env_sync() uses the wrong sync method when syncing a commit written with a different MDB_WRITEMAP setting in another MDB_env.
Two processes with MDB_NOMETASYNC, each process doing every 2nd write txn, will sync each other's meta pages. If they have different MDB_WRITEMAPs, every meta page gets synced wrongly. This breaks durability of ACID.
There is a similar problem if a process crashes after writing the meta page but before sync succeeds, and mdb_env_open() then resets the lockfile to refer to the unsynced commit. Robust mutexes will introduce a similar problem without mdb_env_open.
I'm not volunteering to figure out how to do this right, e.g. how do fsync/msync/FlushFileBuffers work on various OSes if the file descriptor or memory map is read-only, do we need to set a "need to sync" flag in the lockfile in this case for the first writer or write txn to obey?
Another fix: Disable this scenario. Store the MDB_WRITEMAP setting in the lockfile when resetting it, even with MDB_RDONLY. Obey that flag rather than the writemap flag to mdb_env_open() when not resetting the lockfile. However, now a small program like mdb_stat can have disproportionate effect on another process which opens the env at the same time. Also nested txns need to work with MDB_WRITEMAP.
For the crash case above and robust mutexes:
Maybe mdb_env_open() should not modify me_txns->mti_txnid if it refers to the oldest meta page. That way the possibly unsynced commit will never be exposed unless the lockfile is removed. But next write txn must then reset the "hidden" metapage and sync before proceeding, similar to how mdb_env_write_meta() does at failure. Otherwise removing the lockfile would expose a meta page referring to data which may have been overwritten, e.g. by an mdb_abort()ed commit.
Another variant would be to sync in mdb_env_open() when resetting the lockfile, or maybe an MDB_RDONLY env must set a "sync needed" flag.