The documentation of MDB_NOSYNC says:
If the filesystem preserves write order and the MDB_WRITEMAP flag is not used, transactions exhibit ACI (atomicity, consistency, isolation) properties and only lose D (durability).
In practice, what file system + options preserve write order?
Asked this question elsewhere from Howard. I got the answer that ZFS should do it, and ext4 with data=ordered _may_ do it. It seems to me that ext4 with data=journal should be a very safe bet, too, would it not? Are there any other recommendations?
I ran a few microbenchmarks to compare ext4 data=ordered and data=journal. With the default sync, they can do about 600 and 400 write txn/s. With nosync + an mdb_env_sync() every second, they are both at about 200k txn/s. For reference, the system can do about 5 million read txn/s. That makes me hopeful that ext4 with data=journal could be a good option.
Cheers, Gábor Melis