On 27/07/14 03:27, openldap-commit2devel@OpenLDAP.org wrote:
commit 3630066843b7ca6b2cd12911d3e2fe3314cd4549 Author: Howard Chu hyc@symas.com Date: Sat Jul 26 18:16:02 2014 -0700
Fix MIPS cache coherency on Linux MIPS chips require manual control of on-chip caches. The cacheflush syscall being used here only exists on MIPS Linux, other OSs will require revisiting.
I may be guessing wrong what's going on here, but anyway:
CACHEFLUSH looks like it belongs before setting me_txns->mti_txnid which tells other threads/processes to use the new metapage.
If so, MDB_NOLOCK may be in trouble since it uses pick_meta() instead of mti_txnid. Should there be a separate CACHEFLUSH after writing the datapages if MDB_NOLOCK, and the current CACHEFLUSH should just flush the metapages?
Does the code contradict this comment above, or is it about something else? /* Memory ordering issues are irrelevant ... */
Hallvard Breien Furuseth wrote:
On 27/07/14 03:27, openldap-commit2devel@OpenLDAP.org wrote:
commit 3630066843b7ca6b2cd12911d3e2fe3314cd4549 Author: Howard Chu hyc@symas.com Date: Sat Jul 26 18:16:02 2014 -0700
Fix MIPS cache coherency on Linux MIPS chips require manual control of on-chip caches. The cacheflush syscall being used here only exists on MIPS Linux, other OSs will require revisiting.
I may be guessing wrong what's going on here, but anyway:
CACHEFLUSH looks like it belongs before setting me_txns->mti_txnid which tells other threads/processes to use the new metapage.
Hm, maybe.
If so, MDB_NOLOCK may be in trouble since it uses pick_meta() instead of mti_txnid. Should there be a separate CACHEFLUSH after writing the datapages if MDB_NOLOCK, and the current CACHEFLUSH should just flush the metapages?
I don't see any reason for that. As always, the only thing that matters is when the metapages get written.
Does the code contradict this comment above, or is it about something else? /* Memory ordering issues are irrelevant ... */
Quite simply, on MIPS, write()s into the buffer cache aren't coherent with the on-chip data (or instruction, but irrelevant) cache.
On 02/08/14 18:57, Howard Chu wrote:
Hallvard Breien Furuseth wrote:
(...) If so, MDB_NOLOCK may be in trouble since it uses pick_meta() instead of mti_txnid. Should there be a separate CACHEFLUSH after writing the datapages if MDB_NOLOCK, and the current CACHEFLUSH should just flush the metapages?
I don't see any reason for that. As always, the only thing that matters is when the metapages get written.
What matters is that nothing *sees* the metapage before its data pages, nor sees the mti_txnid change before the metapage. I thought that's what cache coherency and memory ordering was about.
So to explain my previous message a bit: A cacheflush() which flushes a metapage and its datapages all in one chunk makes me nervous. If that's necessary (rather than just flushing the meta at that point), I imagine that just before the flush, it's possible for something to see the metapage before its datapages. Delaying the mti_txnid change protects from that, except when something does not use mti_txnid - hence the concern for MDB_NOLOCK using mdb_env_pick_meta().
Hmm. There are some other places that use pick_meta even when the lockfile is in use. Maybe they too should try (mti_txnid & 1).
Does the code contradict this comment above, or is it about something else? /* Memory ordering issues are irrelevant ... */
Quite simply, on MIPS, write()s into the buffer cache aren't coherent with the on-chip data (or instruction, but irrelevant) cache.
Hallvard Breien Furuseth wrote:
On 02/08/14 18:57, Howard Chu wrote:
Hallvard Breien Furuseth wrote:
(...) If so, MDB_NOLOCK may be in trouble since it uses pick_meta() instead of mti_txnid. Should there be a separate CACHEFLUSH after writing the datapages if MDB_NOLOCK, and the current CACHEFLUSH should just flush the metapages?
I don't see any reason for that. As always, the only thing that matters is when the metapages get written.
What matters is that nothing *sees* the metapage before its data pages, nor sees the mti_txnid change before the metapage. I thought that's what cache coherency and memory ordering was about.
No, memory ordering is about seeing partial changes to a single page, outside of the order that the program wrote it. Which can happen on superscalar chips with out-of-order execution.
So to explain my previous message a bit: A cacheflush() which flushes a metapage and its datapages all in one chunk makes me nervous. If that's necessary (rather than just flushing the meta at that point), I imagine that just before the flush, it's possible for something to see the metapage before its datapages.
Not possible. The cacheflush is atomic. Note that this is invalidating an on-chip data cache which is typically only 32KB or so. It has nothing to do with flushing the buffer cache. ("flush" is a misnomer, but that's what the syscall is called.)
Delaying the mti_txnid change protects from that, except when something does not use mti_txnid - hence the concern for MDB_NOLOCK using mdb_env_pick_meta().
Hmm. There are some other places that use pick_meta even when the lockfile is in use. Maybe they too should try (mti_txnid & 1).
Does the code contradict this comment above, or is it about something else? /* Memory ordering issues are irrelevant ... */
Quite simply, on MIPS, write()s into the buffer cache aren't coherent with the on-chip data (or instruction, but irrelevant) cache.
On 02/08/14 22:31, Howard Chu wrote:
Hallvard Breien Furuseth wrote:
What matters is that nothing *sees* the metapage before its data pages, nor sees the mti_txnid change before the metapage. I thought that's what cache coherency and memory ordering was about.
No, memory ordering is about seeing partial changes to a single page, outside of the order that the program wrote it. Which can happen on superscalar chips with out-of-order execution.
OK, that explains half of that code, thanks:-)
So to explain my previous message a bit: A cacheflush() which flushes a metapage and its datapages all in one chunk makes me nervous. If that's necessary (rather than just flushing the meta at that point), I imagine that just before the flush, it's possible for something to see the metapage before its datapages.
Not possible. The cacheflush is atomic.
Fine, but it's the situation before the flush which worries me, when the cache can be incoherent.
Note that this is invalidating an on-chip data cache which is typically only 32KB or so. It has nothing to do with flushing the buffer cache. ("flush" is a misnomer, but that's what the syscall is called.)
OK, but apparently it's still a cache which can include data from both metapages and datapages.
Or is this only relevant for the same thread, and not other threads/processes (read-only txns starting before the flush)? In that case it would be safe.
Hallvard Breien Furuseth wrote:
On 02/08/14 22:31, Howard Chu wrote:
Hallvard Breien Furuseth wrote:
So to explain my previous message a bit: A cacheflush() which flushes a metapage and its datapages all in one chunk makes me nervous. If that's necessary (rather than just flushing the meta at that point), I imagine that just before the flush, it's possible for something to see the metapage before its datapages.
Not possible. The cacheflush is atomic.
Fine, but it's the situation before the flush which worries me, when the cache can be incoherent.
Note that this is invalidating an on-chip data cache which is typically only 32KB or so. It has nothing to do with flushing the buffer cache. ("flush" is a misnomer, but that's what the syscall is called.)
OK, but apparently it's still a cache which can include data from both metapages and datapages.
The thing is, it will only include *old* data from the meta pages or data pages. Because nothing that was updated by write() will be visible to the chip (thru the mmap) until the on-chip data cache is invalidated. And all of that old data will be self-consistent because until the metapage update is visible, nobody will go looking for any of the new datapages.
Or is this only relevant for the same thread, and not other threads/processes (read-only txns starting before the flush)? In that case it would be safe.
It has nothing to do with threads or processes.
Should have changed the Subject: before. Anyway:
On 02/08/14 23:28, Howard Chu wrote:
Hallvard Breien Furuseth wrote:
On 02/08/14 22:31, Howard Chu wrote:
Note that this is invalidating an on-chip data cache which is typically only 32KB or so. It has nothing to do with flushing the buffer cache. ("flush" is a misnomer, but that's what the syscall is called.)
OK, but apparently it's still a cache which can include data from both metapages and datapages.
The thing is, it will only include *old* data from the meta pages or data pages. Because nothing that was updated by write() will be visible to the chip (thru the mmap) until the on-chip data cache is invalidated. And all of that old data will be self-consistent because until the metapage update is visible, nobody will go looking for any of the new datapages.
Why so? A busy machine throws data out of caches to make room for other data. Some other program could presumably cause it to throw out the old metapage version, exposing the new uncached version, while and old version of a datapage is still cached.
Unless this is a fully associative LRU cache, so it always throws out the oldest pages first, and no mdb_copy process just read the cached datapage and thus refreshed it in the cache.
I remain suspicious of cacheflush(metapage and its datapages) in one call, instead of cacheflush(datapges) before write(metapage).
Hallvard Breien Furuseth wrote:
Should have changed the Subject: before. Anyway:
On 02/08/14 23:28, Howard Chu wrote:
Hallvard Breien Furuseth wrote:
On 02/08/14 22:31, Howard Chu wrote:
Note that this is invalidating an on-chip data cache which is typically only 32KB or so. It has nothing to do with flushing the buffer cache. ("flush" is a misnomer, but that's what the syscall is called.)
OK, but apparently it's still a cache which can include data from both metapages and datapages.
The thing is, it will only include *old* data from the meta pages or data pages. Because nothing that was updated by write() will be visible to the chip (thru the mmap) until the on-chip data cache is invalidated. And all of that old data will be self-consistent because until the metapage update is visible, nobody will go looking for any of the new datapages.
Why so? A busy machine throws data out of caches to make room for other data. Some other program could presumably cause it to throw out the old metapage version, exposing the new uncached version, while and old version of a datapage is still cached.
On a busy machine the old data page will be long gone from the cache. Keep in mind that an old data page being written by a current txn cannot have been referenced by any of the previous 2 txns.
Unless this is a fully associative LRU cache, so it always throws out the oldest pages first, and no mdb_copy process just read the cached datapage and thus refreshed it in the cache.
I remain suspicious of cacheflush(metapage and its datapages) in one call, instead of cacheflush(datapges) before write(metapage).