Full_Name: Hallvard B Furuseth
Submission from: (NULL) (220.127.116.11)
Submitted by: hallvard
mdb_env_sync() uses the wrong sync method when syncing a commit
written with a different MDB_WRITEMAP setting in another MDB_env.
Two processes with MDB_NOMETASYNC, each process doing every 2nd
write txn, will sync each other's meta pages. If they have
different MDB_WRITEMAPs, every meta page gets synced wrongly.
This breaks durability of ACID.
Sounds like a doc issue. This can only arise if two separate processes are
using different configurations to access the same MDB environment. Most
applications will always use identical configurations to access their
databases, so this won't occur.
There is a similar problem if a process crashes after writing
the meta page but before sync succeeds, and mdb_env_open() then
resets the lockfile to refer to the unsynced commit. Robust
mutexes will introduce a similar problem without mdb_env_open.
I'm not volunteering to figure out how to do this right, e.g. how
do fsync/msync/FlushFileBuffers work on various OSes if the file
descriptor or memory map is read-only, do we need to set a "need
to sync" flag in the lockfile in this case for the first writer
or write txn to obey?
Another fix: Disable this scenario. Store the MDB_WRITEMAP
setting in the lockfile when resetting it, even with MDB_RDONLY.
Obey that flag rather than the writemap flag to mdb_env_open()
when not resetting the lockfile. However, now a small program
like mdb_stat can have disproportionate effect on another process
which opens the env at the same time. Also nested txns need to
work with MDB_WRITEMAP.
For the crash case above and robust mutexes:
Maybe mdb_env_open() should not modify me_txns->mti_txnid if it
refers to the oldest meta page. That way the possibly unsynced
commit will never be exposed unless the lockfile is removed.
But next write txn must then reset the "hidden" metapage and sync
before proceeding, similar to how mdb_env_write_meta() does at
failure. Otherwise removing the lockfile would expose a meta
page referring to data which may have been overwritten, e.g. by
an mdb_abort()ed commit.
Another variant would be to sync in mdb_env_open() when resetting
the lockfile, or maybe an MDB_RDONLY env must set a "sync needed"
Nothing to fix in code. Doc this as "don't do this." Nobody currently does
anyway so it will have no impact.
-- Howard Chu
CTO, Symas Corp. http://www.symas.com
Director, Highland Sun http://highlandsun.com/hyc/
Chief Architect, OpenLDAP http://www.openldap.org/project/