On 18/12/14 05:40, openldap-commit2devel@OpenLDAP.org wrote:
commit 0018eeb2c3b2239c30def9d47c9d194a4ebf35fe Author: Howard Chu hyc@openldap.org Date: Thu Dec 18 04:38:53 2014 +0000
Hack for potential ext3/ext4 corruption issue Use regular fsync() if we think this commit grew the DB file.
This does not catch all cases:
If the new pages below mt_next_pgno were freed instead of written, me_size becomes too big. Later when the file does grow, me_size may be >= actual filesize so it fdatasync()s. Similar to b09e46904c1c059bd5086243e3915b6be510e57d "ITS#7886 fix mdb_copy write size". We can fix me_size, grow the file anyway (ftruncate), or give the pages back to mt_next_pgno in mdb_freelist_save().
Another issue: After an MDB_NOSYNC commit, mdb_env_sync() only fdatasync()s. It does not know when the file grew. The planned "group commits" may get the same problem if the user checkpoints with mdb_env_sync().
Hallvard Breien Furuseth wrote:
On 18/12/14 05:40, openldap-commit2devel@OpenLDAP.org wrote:
commit 0018eeb2c3b2239c30def9d47c9d194a4ebf35fe Author: Howard Chu hyc@openldap.org Date: Thu Dec 18 04:38:53 2014 +0000
Hack for potential ext3/ext4 corruption issue Use regular fsync() if we think this commit grew the DB file.
This does not catch all cases:
If the new pages below mt_next_pgno were freed instead of written, me_size becomes too big.
Huh? mt_next_pgno definitively tells how many pages have ever been used in the DB file.
Later when the file does grow, me_size may be >= actual filesize so it fdatasync()s. Similar to b09e46904c1c059bd5086243e3915b6be510e57d "ITS#7886 fix mdb_copy write size". We can fix me_size, grow the file anyway (ftruncate), or give the pages back to mt_next_pgno in mdb_freelist_save().
Another issue: After an MDB_NOSYNC commit, mdb_env_sync() only fdatasync()s. It does not know when the file grew.
I suppose we can change the FORCE flag to also cause fsync() to be used.
The planned "group commits" may get the same problem if the user checkpoints with mdb_env_sync().
On 01/06/2015 03:18 PM, Howard Chu wrote:
Hallvard Breien Furuseth wrote:
(....) If the new pages below mt_next_pgno were freed instead of written, me_size becomes too big.
Huh? mt_next_pgno definitively tells how many pages have ever been used in the DB file.
No, see ITS#7886:
"Allocate an ovpage from mt_next_pgno, mdb_ovpage_free() it and commit: The datafile may end before MDB_meta.mm_last_pg since the ovpage was never written. mdb_env_copyfd() & co break when they read the file to mm_last_pg."
Later when the file does grow, me_size may be >= actual filesize so it fdatasync()s. Similar to b09e46904c1c059bd5086243e3915b6be510e57d "ITS#7886 fix mdb_copy write size". We can fix me_size, grow the file anyway (ftruncate), or give the pages back to mt_next_pgno in mdb_freelist_save().
Sorry, forgot this one.
On 01/06/2015 03:18 PM, Howard Chu wrote:
Hallvard Breien Furuseth wrote:
Another issue: After an MDB_NOSYNC commit, mdb_env_sync() only fdatasync()s. It does not know when the file grew.
I suppose we can change the FORCE flag to also cause fsync() to be used.
Insufficient if the user commits with MDB_NOSYNC (maybe when creating the DB), then turns off MDB_NOSYNC and does mdb_env_sync(env, 0). Or another process without MDB_NOSYNC doing mdb_env_sync(env, 0).
The lockfile could track what has been synced how, though. Except it won't know at init, so if someone does a lot of <open env, commit, close env> they'll end up fsync'ing each time.