I was not sure txn can be copied. Write txns tracks all iterators, so it can close them at the end of the txn. But I was not sure read txns have similar things. Now I know I can just copy the txn. Thanks. :-)

Does the "seeing wrong data" scenario apply to different threads in the same process? As I understand it, the writer overwrites the old meta page, regardless of whether some reader is trying to read it. So there is chance that a reader thread gets suspended in the middle of mdb_txn_renew0(), in which it picked the newest txn_id as Tn, and then suspended before the memcpy. Then the writer thread comes in, executes two write transactions, and updates the same meta page as Tn. If the reader wakes up when the writer is updating the same meta page, it can see partially updated data. In that case, what problem could that cause? Should we be worried (would it cause a crash?) about it or the reader will just read the contents in Tn+2?

On Fri, Jun 15, 2018 at 1:35 AM, Howard Chu <hyc@symas.com> wrote:

Chuntao HONG wrote:

Background:

I am trying to modify the LMDB code so we can have multiple threads reading from the same snapshot (same txn_id) at the same time. I am trying to do this in a "fork txn" way. Basically I have a master thread with a read txn, and then I try to create txns in the slave threads with the same txn_id. So I modified the mdb_txn_renew0() function and provide it with the txn_id the master thread is holding. With that I hope the slave transactions can read the same meta page because we pick the meta pages with

meta = env->me_metas[txn->mt_txnid & 1];

But then I realized I might be doing it wrong. There are only two meta pages used in LMDB. So what if there had been two write transactions committed after the master thread held its transaction, i.e. the master thread has txn_id==N and current txn_id==N+2? That means the meta page was over-written and the slave thread may read different data from the meta page than the master.

Why would you bother doing this? Just copy the master's txn structure.

Then the question in the header popped into my mind. When reader threads are created, they copy the meta-db infos with a memcpy like this:

memcpy(txn->mt_dbs, meta->mm_dbs, CORE_DBS * sizeof(MDB_db));

But if the meta page was written in the middle of the memcpy, we can get corrupted data. I am sure there is some code that prevents this data race from happening, since we have been using LMDB with multiple threads for quite a while. Could someone point me to the code that prevents the data race from happening?

There is no data race. Readers are always reading the newer meta page, writers only overwrite the older meta page. As noted in the Caveats, if you suspend a process while it's opening a read transaction, it can see the wrong data.

--
-- Howard Chu
CTO, Symas Corp. http://www.symas.com
Director, Highland Sun http://highlandsun.com/hyc/
Chief Architect, OpenLDAP http://www.openldap.org/project/