Hallvard Breien Furuseth wrote:
On 02/08/14 18:57, Howard Chu wrote:
Hallvard Breien Furuseth wrote:
(...) If so, MDB_NOLOCK may be in trouble since it uses pick_meta() instead of mti_txnid. Should there be a separate CACHEFLUSH after writing the datapages if MDB_NOLOCK, and the current CACHEFLUSH should just flush the metapages?
I don't see any reason for that. As always, the only thing that matters is when the metapages get written.
What matters is that nothing *sees* the metapage before its data pages, nor sees the mti_txnid change before the metapage. I thought that's what cache coherency and memory ordering was about.
No, memory ordering is about seeing partial changes to a single page, outside of the order that the program wrote it. Which can happen on superscalar chips with out-of-order execution.
So to explain my previous message a bit: A cacheflush() which flushes a metapage and its datapages all in one chunk makes me nervous. If that's necessary (rather than just flushing the meta at that point), I imagine that just before the flush, it's possible for something to see the metapage before its datapages.
Not possible. The cacheflush is atomic. Note that this is invalidating an on-chip data cache which is typically only 32KB or so. It has nothing to do with flushing the buffer cache. ("flush" is a misnomer, but that's what the syscall is called.)
Delaying the mti_txnid change protects from that, except when something does not use mti_txnid - hence the concern for MDB_NOLOCK using mdb_env_pick_meta().
Hmm. There are some other places that use pick_meta even when the lockfile is in use. Maybe they too should try (mti_txnid & 1).
Does the code contradict this comment above, or is it about something else? /* Memory ordering issues are irrelevant ... */
Quite simply, on MIPS, write()s into the buffer cache aren't coherent with the on-chip data (or instruction, but irrelevant) cache.