bentrask@comcast.net wrote:
And then it's turtles all the way down.
What you're suggesting won't work. Trust me when I say we have spent far more time thinking about this question than you have.
The only way to guarantee integrity is with ordered writes. All SCSI devices support this feature, but e.g. the Linux kernel does not (and neither does SATA, and no idea about PCIe SSDs...).
Lacking a portable mechanism for ordered writes, you have two choices for preserving integrity - append-only operation (which forces ordered writes anyway) or at least one synchronous write somewhere.
Whenever you decide to reuse existing pages rather than operating as append-only, you create the possibility of overwriting some required data before it was safe to do so. Your 3-root checksum scheme *might* let you detect that the DB is corrupted, but it *won't* let you recover to a clean state. Given that writes occur in unpredictable order, without fsyncs there is no way you can guarantee that anything sane is on the disk.