This discussion should be moved to the openldap-devel list as there's more being discussed now than the patch itself.
Леонид Юрьев wrote:
2015-01-14 8:49 GMT+03:00 Hallvard Breien Furuseth h.b.furuseth@usit.uio.no:
On 13/01/15 19:23, Hallvard Breien Furuseth wrote:
Yes, that didn't come out right. I don't mind inserting the volatile, but I don't know that it helps either. As far as I understand if it was broken without volatile, then it's still broken with it - just hopefully less likely to break. And LMDB couldn't be releaseed with your original unportable sync code along with the volatile.
Sorry, nevermind. Of course when the writer does sync well enough, volatile + the txn_renew loop will have to do for a sync primitive in the reader.
I suppose this requires that sync in the writer thread will shake other threads as well, it won't be private to the writer.
In general this is wrong, more precisely depend on 'volatile' on shared variables and usage of barriers/fences by the readers.
Actually 'volatile' is meaningless at the hardware level. It only serves to prevent the compiler from reordering or eliminating accesses. In the case of LMDB the compiler cannot eliminate accesses because they are to global memory, and the compiler cannot determine anything about the liveness or side-effects of global memory accesses.
Sync-ops due lock/release a mutex by writer issue a memory-barrier for its own thread. With this compiler must update all modified variables, which shaded in the CPU registers. Next a hardware write-barrier (aka release) in the mutex-release code enforce all changes to be visible for other threads (e.g. flush the cache). But 'be visible' here mean 'publish' and other threads can access these changes, but if want this.
In general, to see changes made by the writer, all other threads should issue a read-barrier (aka acquire). On most arches such barrier just inform compiler that memory was changed and variables which cached in the registers must be reloaded. But in some cases (like Itanium) this barrier will be taken in account for instruction scheduling. For 'volatile' compiler should generate barriers each time on read or write such variables.
No. volatile doesn't generate hardware barriers.
More general, memory-barriers are very important to HPC, distributed computing and for super-computers. For example read-barriers may pull changes from internode-bus or other nodes, and write-barriers - publish the local changes.
So, the one way to avoid a race bugs - thinking in terms of publish/pull changes.
"race bug" by definition occurs when multiple writers may modify a particular memory object, causing its value to be indeterminate to an observer. By definition, no such bugs can occur in LMDB because only single writers can ever modify any memory object.
In the case of arbitrary readers viewing writes, these are the possible cases: 1) reader is on same CPU as writer, writes cached there is no issue, the reader sees what the writer wrote. 2) reader is on same CPU as writer, writes not cached there is no issue, the reader must fetch data from global memory 3) reader is on different CPU, writes not cached there is no issue, the reader's CPU must fetch the data from global memory - same as (2) 4) reader is on different CPU, writes are cached the reader may see the cached/stale data, or the CPU may fetch the new data from global memory
Only case (4) has any ambiguity, and LMDB's reader table is specifically designed not to care about the ambiguity. I.e., whether fresh or stale data is seen is irrelevant, LMDB will operate correctly because it does not fresh data. Correct processing of the reader table only depends on the oldest data in the table, so staleness is an asset here.
Correct processing of changes to the meta page requires exclusion from other writers, so the write mutex is used. This also guarantees that all changes to the meta page are flushed to global memory before the next writer begins.
In ITS#7970 you discuss a "heisenbug" which can theoretically occur if multiple write txns complete in the span of 2 memory accesses by a reader. In practice such a bug cannot occur because the act of completing a write txn involves multiple blocking calls (I/O system calls, mutex acquisition/release, etc.) which will force writers to pause relative to any readers in question. In contrast the reader performs no blocking calls at all, and in the window of vulnerability it is performing only 2 single-word memory accesses.
Demonstrating the bug by manually inserting yield()s only proves the point - without those manually inserted yields you cannot actually trigger any situation where the reader thread will be descheduled between those two instructions.
You discuss the ramifications of such a bug as the writer potentially overwriting pages that the reader needs. None of this can actually occur in current LMDB code, again by design. The fact that you encountered these issues while debugging your LIFO patch only reflects on the problems I already pointed out with the LIFO approach.
Hallvard and I had this exact same discussion a few years ago; his example used the debugger to pause the reader at that point. Certainly, if you go out of your way to manually halt the reader at a specific instruction you can break the reader. Without outside intervention it won't happen.