Re: (ITS#7969) LMDB: Globally shared fields of meta-data are not 'volatile'. - openldap-bugs

16 Jan 2015


      This discussion should be moved to the openldap-devel list as there's 
more being discussed now than the patch itself.
Леонид Юрьев wrote:
...
2015-01-14 8:49 GMT+03:00 Hallvard Breien Furuseth h.b.furuseth@usit.uio.no:
...
On 13/01/15 19:23, Hallvard Breien Furuseth wrote:
...
Yes, that didn't come out right.  I don't mind inserting the
volatile, but I don't know that it helps either.  As far as I
understand if it was broken without volatile, then it's still
broken with it - just hopefully less likely to break.  And LMDB
couldn't be releaseed with your original unportable sync code
along with the volatile.
Sorry, nevermind.  Of course when the writer does sync well enough,
volatile + the txn_renew loop will have to do for a sync primitive in
the reader.
...
I suppose this requires that sync in the writer thread
will shake other threads as well, it won't be private to the writer.
In general this is wrong, more precisely depend on 'volatile' on
shared variables and usage of barriers/fences by the readers.
Actually 'volatile' is meaningless at the hardware level. It only serves 
to prevent the compiler from reordering or eliminating accesses. In the 
case of LMDB the compiler cannot eliminate accesses because they are to 
global memory, and the compiler cannot determine anything about the 
liveness or side-effects of global memory accesses.
...
Sync-ops due lock/release a mutex by writer issue a memory-barrier for
its own thread.
With this compiler must update all modified variables, which shaded in
the CPU registers.
Next a hardware write-barrier (aka release) in the mutex-release code
enforce all changes to be visible for other threads (e.g. flush the
cache).
But 'be visible' here mean 'publish' and other threads can access
these changes, but if want this.
...
In general, to see changes made by the writer, all other threads
should issue a read-barrier (aka acquire).
On most arches such barrier just inform compiler that memory was
changed and variables which cached in the registers must be reloaded.
But in some cases (like Itanium) this barrier will be taken in account
for instruction scheduling.
For 'volatile' compiler should generate barriers each time on read or
write such variables.
No. volatile doesn't generate hardware barriers.
...
More general, memory-barriers are very important to HPC, distributed
computing and for super-computers.
For example read-barriers may pull changes from internode-bus or other
nodes, and write-barriers - publish the local changes.
...
So, the one way to avoid a race bugs - thinking in terms of
publish/pull changes.
"race bug" by definition occurs when multiple writers may modify a 
particular memory object, causing its value to be indeterminate to an 
observer. By definition, no such bugs can occur in LMDB because only 
single writers can ever modify any memory object.
In the case of arbitrary readers viewing writes, these are the possible 
cases:
    1) reader is on same CPU as writer, writes cached
       there is no issue, the reader sees what the writer wrote.
    2) reader is on same CPU as writer, writes not cached
       there is no issue, the reader must fetch data from global memory
    3) reader is on different CPU, writes not cached
       there is no issue, the reader's CPU must fetch the data from 
global memory - same as (2)
    4) reader is on different CPU, writes are cached
       the reader may see the cached/stale data, or the CPU may fetch 
the new data from global memory
Only case (4) has any ambiguity, and LMDB's reader table is specifically 
designed not to care about the ambiguity. I.e., whether fresh or stale 
data is seen is irrelevant, LMDB will operate correctly because it does 
not fresh data. Correct processing of the reader table only depends on 
the oldest data in the table, so staleness is an asset here.
Correct processing of changes to the meta page requires exclusion from 
other writers, so the write mutex is used. This also guarantees that all 
changes to the meta page are flushed to global memory before the next 
writer begins.
In ITS#7970 you discuss a "heisenbug" which can theoretically occur if 
multiple write txns complete in the span of 2 memory accesses by a 
reader. In practice such a bug cannot occur because the act of 
completing a write txn involves multiple blocking calls (I/O system 
calls, mutex acquisition/release, etc.) which will force writers to 
pause relative to any readers in question. In contrast the reader 
performs no blocking calls at all, and in the window of vulnerability it 
is performing only 2 single-word memory accesses.
Demonstrating the bug by manually inserting yield()s only proves the 
point - without those manually inserted yields you cannot actually 
trigger any situation where the reader thread will be descheduled 
between those two instructions.
You discuss the ramifications of such a bug as the writer potentially 
overwriting pages that the reader needs. None of this can actually occur 
in current LMDB code, again by design. The fact that you encountered 
these issues while debugging your LIFO patch only reflects on the 
problems I already pointed out with the LIFO approach.
Hallvard and I had this exact same discussion a few years ago; his 
example used the debugger to pause the reader at that point. Certainly, 
if you go out of your way to manually halt the reader at a specific 
instruction you can break the reader. Without outside intervention it 
won't happen.
-- 
   -- Howard Chu
   CTO, Symas Corp.           http://www.symas.com
   Director, Highland Sun     http://highlandsun.com/hyc/
   Chief Architect, OpenLDAP  http://www.openldap.org/project/