Re: (ITS#7969) LMDB: Globally shared fields of meta-data are not 'volatile'.
by leo@yuriev.ru
The attached files is derived from OpenLDAP Software. All of the modifications
to OpenLDAP Software represented in the following patch(es) were developed by
Peter-Service LLC, Moscow, Russia. Peter-Service LLC has not assigned rights
and/or interest in this work to any party. I, Leonid Yuriev am authorized by
Peter-Service LLC, my employer, to release this work under the following terms.
Peter-Service LLC hereby places the following modifications to OpenLDAP Software
(and only these modifications) into the public domain. Hence, these
modifications may be freely used and/or redistributed for any purpose with or
without attribution and/or other notice.
https://gist.github.com/leo-yuriev/dcedd5e5b2a09c605108
6 years, 6 months
Re: (ITS#7971) LMDB: Uncarefully appointment when beginning a readonly txn.
by leo@yuriev.ru
The attached files is derived from OpenLDAP Software. All of the modifications
to OpenLDAP Software represented in the following patch(es) were developed by
Peter-Service LLC, Moscow, Russia. Peter-Service LLC has not assigned rights
and/or interest in this work to any party. I, Leonid Yuriev am authorized by
Peter-Service LLC, my employer, to release this work under the following terms.
Peter-Service LLC hereby places the following modifications to OpenLDAP Software
(and only these modifications) into the public domain. Hence, these
modifications may be freely used and/or redistributed for any purpose with or
without attribution and/or other notice.
commit ff0bde6a95bb8ab2355324b65f451216d5ef2a0e
Author: Leo Yuriev <leo(a)yuriev.ru>
Date: 2014-10-18 02:00:37 +0400
ITS#7971 for LMDB: clarification in mdb_txn_renew0().
diff --git a/libraries/liblmdb/mdb.c b/libraries/liblmdb/mdb.c
index 83a15c1..1e766af 100644
--- a/libraries/liblmdb/mdb.c
+++ b/libraries/liblmdb/mdb.c
@@ -2507,15 +2507,16 @@ mdb_txn_renew0(MDB_txn *txn)
UNLOCK_MUTEX_R(env);
return MDB_READERS_FULL;
}
- ti->mti_readers[i].mr_pid = pid;
- ti->mti_readers[i].mr_tid = tid;
+ r = &ti->mti_readers[i];
+ r->mr_txnid = (txnid_t)-1;
+ r->mr_tid = tid;
+ r->mr_pid = pid; /* should be written last, see ITS#. */
if (i == nr)
ti->mti_numreaders = ++nr;
/* Save numreaders for un-mutexed mdb_env_close() */
env->me_numreaders = nr;
UNLOCK_MUTEX_R(env);
- r = &ti->mti_readers[i];
new_notls = (env->me_flags & MDB_NOTLS);
if (!new_notls && (rc=pthread_setspecific(env->me_txkey, r))) {
r->mr_pid = 0;
6 years, 6 months
(ITS#7971) LMDB: Uncarefully appointment when beginning a readonly txn.
by leo@yuriev.ru
Full_Name: Leonid Yuriev
Version: 2.4.40
OS: RHEL7
URL: ftp://ftp.openldap.org/incoming/
Submission from: (NULL) (31.130.36.33)
In continue of ITS#7969 and ITS#7970.
An additional pages may be allocated unreasonable from map. Also the
MDB_MAP_FULL error may returned while the space is really available.
As we can see on lines 2500-2550 of mdb.c the field mr_txnid updated after the
mr_pid, which is also is a flag of that reader's slot is occupied and other
fields are valid, e.g. it is:
ti->mti_readers[i].mr_pid = pid; /* setup reader's pid, and indicate that other
fields is valid. /
ti->mti_readers[i].mr_tid = tid; /* setup reader's thread id. */
... /* do a little bit of something. /
r->mr_txnid = ti->mti_txnid; / * update a "detent" for reclaiming. */
The problem is that not all cases, the value mr_txnid correctly before being
updated, but will be counted immediately after setup pid in reader's slot. Valid
value for mr_txnid can be any number not less than the last transaction to now.
We can easily verify that the MRC may be incorrect adding mdb_eassert(env,
r->mr_txnid >= ti->mti_txnid) after updating mti_numreaders.
diff --git a/libraries/liblmdb/mdb.c b/libraries/liblmdb/mdb.c
index 262373e..f2ed653 100644
--- a/libraries/liblmdb/mdb.c
+++ b/libraries/liblmdb/mdb.c
@@ -2522,6 +2522,7 @@ mdb_txn_renew0(MDB_txn *txn)
return rc;
}
%%7}
+ mdb_eassert(env, r->mr_txnid >= ti->mti_txnid);
do /* ITS# */
r->mr_txnid = ti->mti_txnid;
And then by 'make test' got:
#2 0x0000000000403722 in mdb_assert_fail (env=env@entry=0x134e010,
expr_txt=expr_txt@entry=0x40fd14 "r->mr_txnid >= ti->mti_txnid",
func=func@entry=0x41067f <__FUNCTION__.7150> "mdb_txn_renew0",
line=line@entry=2525, file=0x40fd08 "mdb.c") at mdb.c:1347
buf = "mdb.c:2525: Assertion 'r->mr_txnid >= ti->mti_txnid' failed in
mdb_txn_renew0()\000\000\000\000\000\000\000\000\000
\025\063\337\273\177\000\000\360\360\316\367\377\177\000\000\340\360\316\367\377\177\000\000\236,]\242\000\000\000\000x\n@\000\000\000\000\000\377\377\377\377\000\000\000\000\002\000\000\000\000\000\000\000\320\016C3C357\336\273\177\000\000\310\344\062\337\273\177\000\000\001\000\000\000\000\000\000\000\003",
'\000' <repeats 15 times>,
"M\000\000\000\000\000\000\000\003\000\000\000\000\000\000\000\016\000\000\000\000\000\000\000"...
#3 0x0000000000403de8 in mdb_txn_renew0 (txn=txn@entry=0x134e210) at
mdb.c:2525
r = 0x7fbbdf32b080
env = 0x134e010
ti = 0x7fbbdf32b000
meta = <optimised out>
i = <optimised out>
nr = <optimised out>
x = <optimised out>
rc = <optimised out>
new_notls = 0
__FUNCTION__ = "mdb_txn_renew0"
#4 0x00000000004045b8 in mdb_txn_begin (env=0x134e010, parent=parent@entry=0x0,
flags=flags@entry=131072, ret=ret@entry=0x7ffff7cef2f0) at mdb.c:2706
txn = 0x134e210
ntxn = <optimised out>
rc = <optimised out>
size = <optimised out>
tsize = 136
There is a possibility that the writer will call mdb_page_alloc() while the
value of mr is wrong. In such case it is possible then reclaiming couldn't be
done and a new page will be allocated from the map. But also possible the
MDB_MAP_FULL error.
Thus, because of this bug additional pages may be allocated unreasonable, and
the MDB_MAP_FULL error may happen while the space is available.
6 years, 6 months
Re: (ITS#7970) LMDB: Critical Heisenbug - Inconsistent reading & SIGSEGV due to the race condition.
by leo@yuriev.ru
The fix is really simple. The patch is attached below, but previously
volatile-fix of ITS is required.
Also, the simple testcase (make test), with an extra yields and the
couple of paranoid asserts, is:
diff --git a/libraries/liblmdb/mdb.c b/libraries/liblmdb/mdb.c
index 3286ffb..2558d6f 100644
--- a/libraries/liblmdb/mdb.c
+++ b/libraries/liblmdb/mdb.c
@@ -2018,6 +2018,8 @@ mdb_page_alloc(MDB_cursor *mc, int num, MDB_page **mp)
if (oldest <= last) {
if (!found_old) {
oldest = mdb_find_oldest(txn);
+ /* LY: catch heisenbug. */
+ mdb_tassert(txn, oldest >= env->me_pgoldest);
env->me_pgoldest = oldest;
found_old = 1;
}
@@ -2034,6 +2036,8 @@ mdb_page_alloc(MDB_cursor *mc, int num, MDB_page **mp)
if (oldest <= last) {
if (!found_old) {
oldest = mdb_find_oldest(txn);
+ /* LY: catch heisenbug. */
+ mdb_tassert(txn, oldest >= env->me_pgoldest);
env->me_pgoldest = oldest;
found_old = 1;
}
@@ -2522,7 +2526,16 @@ mdb_txn_renew0(MDB_txn *txn)
return rc;
}
}
- txn->mt_txnid = r->mr_txnid = ti->mti_txnid;
+ do {
+ r->mr_txnid = ti->mti_txnid;
+ sched_yield();
+ sched_yield();
+ sched_yield();
+ } while(r->mr_txnid != ti->mti_txnid);
+ sched_yield();
+ sched_yield();
+ sched_yield();
+ txn->mt_txnid = r->mr_txnid;
txn->mt_u.reader = r;
meta = env->me_metas[txn->mt_txnid & 1];
}
---
The attached files is derived from OpenLDAP Software. All of the modifications
to OpenLDAP Software represented in the following patch(es) were developed by
Peter-Service LLC, Moscow, Russia. Peter-Service LLC has not assigned rights
and/or interest in this work to any party. I, Leonid Yuriev am authorized by
Peter-Service LLC, my employer, to release this work under the following terms.
Peter-Service LLC hereby places the following modifications to OpenLDAP Software
(and only these modifications) into the public domain. Hence, these
modifications may be freely used and/or redistributed for any purpose with or
without attribution and/or other notice.
commit 6028662665c09f4d34b751bfd9b09223f6cb2433
Author: Leo Yuriev <leo(a)yuriev.ru>
Date: 2014-10-17 23:29:56 +0400
ITS#7970 for LMDB: the Heisenbug.
diff --git a/libraries/liblmdb/mdb.c b/libraries/liblmdb/mdb.c
index 3286ffb..83a15c1 100644
--- a/libraries/liblmdb/mdb.c
+++ b/libraries/liblmdb/mdb.c
@@ -2522,7 +2522,12 @@ mdb_txn_renew0(MDB_txn *txn)
return rc;
}
}
- txn->mt_txnid = r->mr_txnid = ti->mti_txnid;
+
+ do /* LY: Retry on a race, ITS#7970. */
+ r->mr_txnid = ti->mti_txnid;
+ while(r->mr_txnid != ti->mti_txnid);
+
+ txn->mt_txnid = r->mr_txnid;
txn->mt_u.reader = r;
meta = env->me_metas[txn->mt_txnid & 1];
}
6 years, 6 months
(ITS#7970) LMDB: Critical Heisenbug - Inconsistent reading & SIGSEGV due to the race condition.
by leo@yuriev.ru
Full_Name: Leonid Yuriev
Version: 2.4.40
OS: RHEL7
URL: ftp://ftp.openldap.org/incoming/
Submission from: (NULL) (31.130.36.33)
In continue of ITS#7969.
Because of a race condition between readers and writer's activity for FreeDB
reclaiming in case it is nearly empty, some pages that used by reader, may be
reused for a new txn. Thereby reader may get an unpredictable rubbish, throw an
assertion failure or SIGSEGV.
I am sure, the race condition is present, exactly between reader's mr_txnid and
last transaction mti_txnid, which is updated by the writer.
Let see to line 2525 of mdb_txn_renew0() in stable 2.4.40
http://www.openldap.org/devel/gitweb.cgi?p=openldap.git;a=blob;f=librarie...
The code is:
txn->mt_txnid = r-EmEmr_txnid = ti->mti_txnid;
>From a CPU's point of view it is:
step #1, reg = ti->mti_txnid /* load the last txn-id from environment to
register */
step #2, r->mr_txnid = reg /* write txn-id into the reader table */
step #3, txn->mt_txnid = reg
The r->mr_txnid is used by mdb_find_oldest() for lookup a txn-id that is
available for reclaim from FreeDB by mdb_page_alloc().
When the reader's thread doing these steps, the writer can commit a several
transactions and the ti->mti_txnid may be changed between steps 1 and 2. Before
the step 2, a writer is free to reclaim any records from FreeDB (except the
last).
Thereby, the writer can commit several new transactions and reclaim several
records from FreeDB, include the txn which the reader has begun using at the
step1.
In this case the reader may be read a pages that is not contain the such txn any
longer, but the reclaimed pages, which may contain the _anything_ from a new
transaction.
It is hard to reproduce the problem, but we can change the code without altering
its significance, for instance:
diff --git a/libraries/liblmdb/mdb.c b/libraries/liblmdb/mdb.c
index 6cc3433..c501a2e 100644
--- a/libraries/liblmdb/mdb.c
+++ b/libraries/liblmdb/mdb.c
@@ -2018,6 +2018,8 @@ mdb_page_alloc(MDB_cursor *mc, int num, MDB_page **mp)
if (oldest <= last) {
if (!found_old) {
oldest = mdb_find_oldest(txn);
+ /* LY: catch heisenbug. */
+ mdb_tassert(txn, oldest >= env->me_pgoldest);
env->me_pgoldest = oldest;
found_old = 1;
}
@@ -2034,6 +2036,8 @@ mdb_page_alloc(MDB_cursor *mc, int num, MDB_page **mp)
if (oldest <= last) {
if (!found_old) {
oldest = mdb_find_oldest(txn);
+ /* LY: catch heisenbug. */
+ mdb_tassert(txn, oldest >= env->me_pgoldest);
6 years, 6 months
Re: (ITS#7969) LMDB: Globally shared fields of meta-data are not 'volatile'.
by leo@yuriev.ru
The attached files is derived from OpenLDAP Software. All of the modifications
to OpenLDAP Software represented in the following patch(es) were developed by
Peter-Service LLC, Moscow, Russia. Peter-Service LLC has not assigned rights
and/or interest in this work to any party. I, Leonid Yuriev am authorized by
Peter-Service LLC, my employer, to release this work under the following terms.
Peter-Service LLC hereby places the following modifications to OpenLDAP Software
(and only these modifications) into the public domain. Hence, these
modifications may be freely used and/or redistributed for any purpose with or
without attribution and/or other notice.
commit 5e83ea8a76f81dac8d5a9dcd74d5b10a01330102
Author: Leo Yuriev <leo(a)yuriev.ru>
Date: 2014-09-12 01:32:13 +0400
ITS#7969 for LMDB: volatile & __synchronize().
diff --git a/libraries/liblmdb/mdb.c b/libraries/liblmdb/mdb.c
index 6cc3433..3286ffb 100644
--- a/libraries/liblmdb/mdb.c
+++ b/libraries/liblmdb/mdb.c
@@ -580,11 +580,11 @@ typedef struct MDB_rxbody {
* started from so we can avoid overwriting any data used in that
* particular version.
*/
- txnid_t mrb_txnid;
+ volatile txnid_t mrb_txnid;
/** The process ID of the process owning this reader txn. */
- MDB_PID_T mrb_pid;
+ volatile MDB_PID_T mrb_pid;
/** The thread ID of the thread owning this txn. */
- MDB_THR_T mrb_tid;
+ volatile MDB_THR_T mrb_tid;
} MDB_rxbody;
/** The actual reader record, with cacheline padding. */
@@ -632,12 +632,12 @@ typedef struct MDB_txbody {
* This is recorded here only for convenience; the value can always
* be determined by reading the main database meta pages.
*/
- txnid_t mtb_txnid;
+ volatile txnid_t mtb_txnid;
/** The number of slots that have been used in the reader table.
* This always records the maximum count, it is not decremented
* when readers release their slots.
*/
- unsigned mtb_numreaders;
+ volatile unsigned mtb_numreaders;
} MDB_txbody;
/** The actual reader table definition. */
@@ -908,7 +908,7 @@ typedef struct MDB_meta {
/** Any persistent environment flags. @ref mdb_env */
#define mm_flags mm_dbs[0].md_flags
pgno_t mm_last_pg; /**< last used page in file */
- txnid_t mm_txnid; /**< txnid that committed this page */
+ volatile txnid_t mm_txnid; /**< txnid that committed this page */
} MDB_meta;
/** Buffer for a stack-allocated meta page.
@@ -3559,6 +3559,10 @@ mdb_env_write_meta(MDB_txn *txn)
mp->mm_dbs[0] = txn->mt_dbs[0];
mp->mm_dbs[1] = txn->mt_dbs[1];
mp->mm_last_pg = txn->mt_next_pgno - 1;
+#if !(defined(_MSC_VER) || defined(__i386__) || defined(__x86_64__))
+ /* LY: issue a memory barrier, if not x86. */
+ __sync_synchronize();
+#endif
mp->mm_txnid = txn->mt_txnid;
if (!(env->me_flags & (MDB_NOMETASYNC|MDB_NOSYNC))) {
unsigned meta_size = env->me_psize;
6 years, 6 months
(ITS#7969) LMDB: Globally shared fields of meta-data are not 'volatile'.
by leo@yuriev.ru
Full_Name: Leonid Yuriev
Version: 2.4.40
OS: RHEL7
URL: ftp://ftp.openldap.org/incoming/
Submission from: (NULL) (31.130.36.33)
Several fields from the meta-data that shared globally across active writer and
all readers are not 'volatile'. This allow compiler to cache its values in
registers and reorder writes.
More over, in some cases compiler is free to eliminate the calls of function
which seems to be 'const' or 'pure', pls see http://lwn.net/Articles/285332/.
Thereby a series of Heisenbugs may be induced, for instance:
1) The mdb_find_oldest() calls may be potentially eliminated up to public (not
static) functions due to interprocedural optimization 'unit at once'.
2) Reordering of CPU instructions which updates the txn in a meta-page. Without
"volatile" or memory-barrier compiler may reorder instructions for update the
"mm_txnid" field in meta-page in "writemap" mode.
Thereby, from the reader's point of view this cause a short time interval when
the transaction is corrupted.
On some architectures few actions are required for cache coherence, for instance
- GCC's __synchronize() must be called.
3) A really critical bugs will be described later in separate ITS.
6 years, 6 months
Re: (ITS#7968) SIGSEGV shortly after reconnection performed by syncrepl due to synchronization conflicts
by leo@yuriev.ru
Once again SIGSEGV.
I think the problem is not here, but in a connection cancel/abandon code.
It seems like race conditions with asynchronous connection dropping.
But currently I not review enough of code.
/servers/slapd/overlays/syncprov.c
@@ -1307,21 +1307,21 @@ syncprov_matchops( Operation *op, opcookie
*opc, int saveit )
op2.o_hdr = &oh;
op2.o_extra = op->o_extra;
op2.o_callback = NULL;
if (ss->s_flags & PS_FIX_FILTER) {
/* Skip the AND/GE clause that we
stuck on in front. We
would lose deletes/mods that happen
during the refresh
phase otherwise (ITS#6555) */
op2.ors_filter =
ss->s_op->ors_filter->f_and->f_next;
}
ldap_pvt_thread_mutex_unlock( &ss->s_mutex );
rc = test_filter( &op2, e, op2.ors_filter );
}
Debug( LDAP_DEBUG_NONE, "syncprov_matchops: sid %03x
fscope %d rc %d\n",
ss->s_sid, fc.fscope, rc );
Program received signal SIGSEGV, Segmentation fault.
[Switching to Thread 0x7f9762ffe700 (LWP 29507)]
test_filter (op=0x7f9762ffc210, e=0x7f96f19c37d8, f=0x20) at filterentry.c:69
69 if ( f->f_choice & SLAPD_FILTER_UNDEFINED ) {
(0) /opt/openldap.devel/libexec/slapd() [0x4430b7]: test_filter
/home/ly/Projects/openldap.git/servers/slapd/filterentry.c:69
(1) /opt/openldap.devel/libexec/slapd() [0x515081]: syncprov_matchops
/home/ly/Projects/openldap.git/servers/slapd/overlays/syncprov.c:1317
(2) /opt/openldap.devel/libexec/slapd() [0x515f43]: syncprov_op_response
/home/ly/Projects/openldap.git/servers/slapd/overlays/syncprov.c:1941
(3) /opt/openldap.devel/libexec/slapd() [0x434163]: slap_response_play
/home/ly/Projects/openldap.git/servers/slapd/result.c:509
(4) /opt/openldap.devel/libexec/slapd() [0x4346ca]: send_ldap_response
/home/ly/Projects/openldap.git/servers/slapd/result.c:584
(5) /opt/openldap.devel/libexec/slapd() [0x435062]: slap_send_ldap_result
/home/ly/Projects/openldap.git/servers/slapd/result.c:861
(6) /opt/openldap.devel/libexec/slapd() [0x4cb2e9]: mdb_add
/home/ly/Projects/openldap.git/servers/slapd/back-mdb/add.c:434
(7) /opt/openldap.devel/libexec/slapd() [0x48b506]: overlay_op_walk
/home/ly/Projects/openldap.git/servers/slapd/backover.c:674
(8) /opt/openldap.devel/libexec/slapd() [0x48b671]: over_op_func
/home/ly/Projects/openldap.git/servers/slapd/backover.c:724
6 years, 6 months