On 01/05/2015 12:58 PM, Howard Chu wrote:
The LKML thread indicates that this bug was already fixed. The zheng mai paper says they used RHEL6, which shipped with kernel 2.6.32 so it apparently was too old to have the fix.
All in all a bunch of bogus reporting; claiming that all DBs are broken when in fact LMDB is perfectly correct.
True - but often uninteresting from the user's perspective. So I do think Linux should default to fsync for some years - at least when the file may have grown. Makefile can explain the problem and provide a variable to always use fdatasync, if the admin knows the kernel is OK.
As for how to know the synced size, if you want to do more than always use fsync on an OS where fdatasync is unreliable:
I drafted some code to get around it, but it got messy. If we use more code for this than just '#define MDB_FDATASYNC fsync', I suggest to handle it all in mdb_env_sync() which can fstat():
struct MDB_env: off_t me_size; /**< file size known to be synced, or 0 */
mdb_env_sync() { ...; #if MDB_BUGGY_FDATASYNC size_t sz = 0; if (mdb_fsize(env->me_fd, &sz) != MDB_SUCCESS || sz != env->me_size) { if (fsync(env->me_fd)) rc = ErrCode(); else if (sz) env->me_size = sz; } else #endif ...normal sync...; }
mdb_env_open() does not know if the current filesize has been synced, so drop setting me_size there.