Full_Name: Leonid Yuriev Version: 2.4.40 OS: RHEL7 URL: ftp://ftp.openldap.org/incoming/ Submission from: (NULL) (31.130.36.33)
In continue of ITS#7969.
Because of a race condition between readers and writer's activity for FreeDB reclaiming in case it is nearly empty, some pages that used by reader, may be reused for a new txn. Thereby reader may get an unpredictable rubbish, throw an assertion failure or SIGSEGV.
I am sure, the race condition is present, exactly between reader's mr_txnid and last transaction mti_txnid, which is updated by the writer.
Let see to line 2525 of mdb_txn_renew0() in stable 2.4.40 http://www.openldap.org/devel/gitweb.cgi?p=openldap.git;a=blob;f=libraries/l...
The code is: txn->mt_txnid = r-EmEmr_txnid = ti->mti_txnid;
From a CPU's point of view it is:
step #1, reg = ti->mti_txnid /* load the last txn-id from environment to register */ step #2, r->mr_txnid = reg /* write txn-id into the reader table */ step #3, txn->mt_txnid = reg
The r->mr_txnid is used by mdb_find_oldest() for lookup a txn-id that is available for reclaim from FreeDB by mdb_page_alloc().
When the reader's thread doing these steps, the writer can commit a several transactions and the ti->mti_txnid may be changed between steps 1 and 2. Before the step 2, a writer is free to reclaim any records from FreeDB (except the last). Thereby, the writer can commit several new transactions and reclaim several records from FreeDB, include the txn which the reader has begun using at the step1. In this case the reader may be read a pages that is not contain the such txn any longer, but the reclaimed pages, which may contain the _anything_ from a new transaction.
It is hard to reproduce the problem, but we can change the code without altering its significance, for instance: diff --git a/libraries/liblmdb/mdb.c b/libraries/liblmdb/mdb.c index 6cc3433..c501a2e 100644 --- a/libraries/liblmdb/mdb.c +++ b/libraries/liblmdb/mdb.c @@ -2018,6 +2018,8 @@ mdb_page_alloc(MDB_cursor *mc, int num, MDB_page **mp) if (oldest <= last) { if (!found_old) { oldest = mdb_find_oldest(txn); + /* LY: catch heisenbug. */ + mdb_tassert(txn, oldest >= env->me_pgoldest); env->me_pgoldest = oldest; found_old = 1; } @@ -2034,6 +2036,8 @@ mdb_page_alloc(MDB_cursor *mc, int num, MDB_page **mp) if (oldest <= last) { if (!found_old) { oldest = mdb_find_oldest(txn); + /* LY: catch heisenbug. */ + mdb_tassert(txn, oldest >= env->me_pgoldest);