Re: (ITS#8475) Feature request: MDB low durability transactions - openldap-bugs

7 Aug 2016


      On 08/07/2016 05:44 PM, Howard Chu wrote:
...
The only way to guarantee integrity is with ordered writes. All SCSI
devices support this feature, but e.g. the Linux kernel does not (and
neither does SATA, and no idea about PCIe SSDs...).
Lacking a portable mechanism for ordered writes, you have two choices
for preserving integrity - append-only operation (which forces ordered
writes anyway) or at least one synchronous write somewhere.
Whenever you decide to reuse existing pages rather than operating as
append-only, you create the possibility of overwriting some required
data before it was safe to do so. Your 3-root checksum scheme *might*
let you detect that the DB is corrupted, but it *won't* let you recover
to a clean state. Given that writes occur in unpredictable order,
without fsyncs there is no way you can guarantee that anything sane is
on the disk.
Consider three roots without any checksums. Each root has a simple flag 
indicating whether it was written durably (fsync write barrier). During 
recovery, non-durable roots are simply ignored/discarded. This is 
equivalent to Hallvard's suggestion for volatile meta-pages. I think 
it's pretty clear this is workable.
From there, checksums just give you slightly stronger guarantees, 
although they might not be worth the overhead (CPU/storage) and recovery 
complexity.