openldap-bugs October 2014

openldap-bugs@openldap.org

14 participants
76 discussions

Re: (ITS#7972) zgrim
by hyc＠symas.com 19 Oct '14

19 Oct '14

zzgrim(a)gmail.com wrote: > Full_Name: zgrim > Version: 2.4.40 > OS: Linux > URL: ftp://ftp.openldap.org/incoming/ > Submission from: (NULL) (79.115.186.237) > > > Hello, thank you for lmdb. > I'm using lmdb via a Go wrapper and under some not-yet-clear circumstances, > under high concurrency stress testing, > I sometimes get "double free or corruption" SIGABRT related to > mdb_dbis_update(). > The setup is using the NOTLS and RDONLY flags on a 64bit 3.17.0 Arch Linux > system. > If I protect the free() inside mdb_dbis_update() (in mdb.c) with a mutex (code > inline below), the symptom disappears. > I'm sorry I can't provide a simple test case, but in my code, without the > mutex, launching some tens of test clients simultaneously trigger the > corruption fairly quick, i.e. a few minutes or even seconds. Sounds like you have not read the mdb_dbi_open() doc closely. The code in mdb_dbis_update() only triggers if you've opened new DBs in a transaction. The doc for mdb_dbi_open() specifically says you cannot call it from multiple concurrent transactions. Closing this ITS. -- -- Howard Chu CTO, Symas Corp. http://www.symas.com Director, Highland Sun http://highlandsun.com/hyc/ Chief Architect, OpenLDAP http://www.openldap.org/project/

1 0

Re: (ITS#7958) LMDB: LIFO-reclaiming, write-performance improvement & bugfixes
by leo＠yuriev.ru 19 Oct '14

19 Oct '14

>>> We are using the combination: >>> envflags writemap nosync lifo >>> checkpoint 0 1 >>> >>> If the checkpoint is set in seconds, it gives us the assurance >>> consistent state database on disk. >>> However, without this patch meta-pages can be written by the kernel >>> before the data. >>> >>> In fact, for a full guarantee in case of death slapd process, >>> meta-page should be written explicitly. > > > No, the DB can never go inconsistent due to a process crashing - the pages > in OS cache are always correct. It can only go inconsistent if the OS > crashes and a proper sync has not occurred. Yes, Howard, you are right. But apparently I need to be more precise. Talking about "death" of slapd, I meant all the reasons, including power off. For example, a power-off case: - The main power is turned off and the system switches to the UPS. - Given the notice, OS starts an emergency stop processes. - For some reason (does not have enough time to stop) slapd receives SIGKILL. - OS tries to write mmap-region of the DB-file and begins with the lower address. - let the meta-pages has written completely, but for the rest of the data is not enough battery power. - Now DB is completely destroyed on the disk. To avoid this, the meta-pages should not be included in the rw-mapped region, and should be written explicity after a data pages. >>>>> commit 8ddd63161aeb2689822d1a8d27385d62e4e341ae >>>>> Author: Leo Yuriev <leo(a)yuriev.ru> >>>>> Date: 2014-09-19 22:47:19 +0400 >>>>> >>>>> BUGFIX - lmdb: properly sync meta-pages in mdb_sync_env(). >>>>> >>>>> Meta-pages may be updated during data-syncing in mdb_sync_env(), >>>>> in this case database would be inconsistent. >>>>> >>>>> Check-and-retry if lead txn-id changed during flushing data in >>>>> mdb_sync_env(). > > Fundamentally, you are trying to make an inherently unsafe configuration > "safer", but it's impossible. Assume you have mlock'd the meta pages into > memory, so the OS never flushes them itself any more, and you're running > with NOSYNC. That means, within 3 transactions, the data pages on disk will > be out of sync with the meta pages on disk. If the OS crashes at that point, > the entire DB will be lost. Not a problem. I had explained above - we should write meta-pages explicitly after the data sync. But also we should not perform reclaiming ahead of the last checkpoint. > The only way to make this mode of operation somewhat safe is to defer > reclaiming pages for even longer. E.g., instead of halting at current_txnid > - 3, halt at current_txnid - 22, in which case the data pointed to by the > on-disk meta pages cannot get obsolete until 20 transactions have occurred. > > Note that in combination with your LIFO patch, it's pretty much guaranteed > that the on-disk meta pages will be useless after only 2 un-sync'd > transactions. Yes, Howard, you are right. But I think there is confusion in the discussion because of mixing of LIFO-feature and changes for checkpoints consistency in a NOSYNC and WRITEMAP+NOSYNC modes. For a "NOSYNC + checkpoints" topic I will submit a separate ITS (like a 'volaile' related 7969,7970,7971). My opinion - it is a flaw, and no reason to don't fix it. Continuing the conversation about checkpoints in a LIFO context. I saw the problem, that you specified, and thinking over its solution, but have not yet found "golden ratio". And since we are having a serious problem with syncrepl, then I put off this task with an excuse "LIFO-patch not does worse than it was." In general, we should do not reclaim anything ahead of the txn, that is synced to the disk (let this be named a R-rule). To do so we need a second field like mti_txnid, but which will be update only at the end of mdb_env_write_meta(). Finally we should start search in mdb_find_oldest() from value of this new field instead of the current txn number. This seems to will be work fine. However, I stopped on a reasoning - about the purpose of the checkpoints, about design LMDB as a product, about the expectations of the user and the necessary configuration parameters: - checkpoints are needed ONLY in nosync modes; - if the user does NOT activate the checkpoints, he do not care about consistency; - but if it is turned on, we MUST provide consistency on the checkpoints; - otherwise a checkpoints feature is thoughtless and should be REMOVED. Therefore implementation of checkpoints & reclaiming should be updated to conform to the "R-rule", that noted above. >From this point of view a LIFO-feature also should be refined, but nevertheless can be very useful. - SYNC mode = takes a benefit from storage with write-back cache (assume powered by battery). - ASYNC/NOSYNC without checkpoint = significant reduction of committed/dirty pages and thereby much less write-iops. - ASYNC/NOSYNC with checkpoint = seems to same as a SYNC case. Total of all the above - I think first we need to fix a reclaiming or delete the checkpoints, and then I will complete LIFO. > -- Howard Chu > CTO, Symas Corp. http://www.symas.com > Director, Highland Sun http://highlandsun.com/hyc/ > Chief Architect, OpenLDAP http://www.openldap.org/project/ Thank for conversation. Leonid.

1 0

(ITS#7972) zgrim
by zzgrim＠gmail.com 19 Oct '14

19 Oct '14

Full_Name: zgrim Version: 2.4.40 OS: Linux URL: ftp://ftp.openldap.org/incoming/ Submission from: (NULL) (79.115.186.237) Hello, thank you for lmdb. I'm using lmdb via a Go wrapper and under some not-yet-clear circumstances, under high concurrency stress testing, I sometimes get "double free or corruption" SIGABRT related to mdb_dbis_update(). The setup is using the NOTLS and RDONLY flags on a 64bit 3.17.0 Arch Linux system. If I protect the free() inside mdb_dbis_update() (in mdb.c) with a mutex (code inline below), the symptom disappears. I'm sorry I can't provide a simple test case, but in my code, without the mutex, launching some tens of test clients simultaneously trigger the corruption fairly quick, i.e. a few minutes or even seconds. I tried older releases of lmdb as well as the latest git tree, 9a8eb95674c7b500cfe5f44d03493ff76c9fc0c1 of mdb.master, the bahaviour is the same. I also tried several approaches in the higher level code, like either txn reset and renew and a pool of txns or simplified the code to bare bones to just abort after each transaction and remake new txns, it does not matter, the corruption happens the same. If the problem should lie in the higher level code, I'd welcome your insights on possible culprits, I'm very new to lmdb. Thanks. against latest git mentioned above i applied something like, @@ -2769,19 +2769,24 @@ MDB_dbi n = txn->mt_numdbs; MDB_env *env = txn->mt_env; unsigned char *tdbflags = txn->mt_dbflags; + int rc = 0, rx = 0; for (i = n; --i >= 2;) { if (tdbflags[i] & DB_NEW) { if (keep) { env->me_dbflags[i] = txn->mt_dbs[i].md_flags | MDB_VALID; } else { - char *ptr = env->me_dbxs[i].md_name.mv_data; - if (ptr) { + if (env->me_dbxs[i].md_name.mv_data != NULL) { + mdb_mutex_t *wmutex = MDB_MUTEX(env, w); + rc = LOCK_MUTEX(rx, env, wmutex); + + free(env->me_dbxs[i].md_name.mv_data); env->me_dbxs[i].md_name.mv_data = NULL; env->me_dbxs[i].md_name.mv_size = 0; env->me_dbflags[i] = 0; env->me_dbiseqs[i]++; - free(ptr); + + if (!rc) UNLOCK_MUTEX(wmutex); } } }

1 0

Re: (ITS#7958) LMDB: LIFO-reclaiming, write-performance improvement & bugfixes
by hyc＠symas.com 18 Oct '14

18 Oct '14

Леонид Юрьев wrote: > 2014-10-03 3:13 GMT+04:00 Howard Chu <hyc(a)symas.com>: > 2014-10-04 0:04 GMT+04:00 Леонид Юрьев <leo(a)yuriev.ru>: >> 2014-10-03 3:13 GMT+04:00 Howard Chu <hyc(a)symas.com>: >>>> commit 841059330fd44769e93eb4b937c3ce42654fad6f >>>> Author: Leo Yuriev <leo(a)yuriev.ru> >>>> Date: 2014-09-20 07:16:15 +0400 >>>> >>>> BUGFIX - lmdb: lock meta-pages in writemap-mode to avoid unexpected >>>> write, >>>> before the data pages would be synchronized. >>>> >>>> Without locking the meta-pages may be writen by OS before other >>>> data, >>>> in this case database would be inconsistent. >>> >>> >>> Seems unnecessary. Won't happen by default; could happen with MDB_NOSYNC but >>> that risk is already documented. >> >> We are using the combination: >> envflags writemap nosync lifo >> checkpoint 0 1 >> >> If the checkpoint is set in seconds, it gives us the assurance >> consistent state database on disk. >> However, without this patch meta-pages can be written by the kernel >> before the data. >> >> In fact, for a full guarantee in case of death slapd process, >> meta-page should be written explicitly. No, the DB can never go inconsistent due to a process crashing - the pages in OS cache are always correct. It can only go inconsistent if the OS crashes and a proper sync has not occurred. >> But it requires a lot of changes and I do not do that. >>>> commit 8ddd63161aeb2689822d1a8d27385d62e4e341ae >>>> Author: Leo Yuriev <leo(a)yuriev.ru> >>>> Date: 2014-09-19 22:47:19 +0400 >>>> >>>> BUGFIX - lmdb: properly sync meta-pages in mdb_sync_env(). >>>> >>>> Meta-pages may be updated during data-syncing in mdb_sync_env(), >>>> in this case database would be inconsistent. >>>> >>>> Check-and-retry if lead txn-id changed during flushing data in >>>> mdb_sync_env(). Fundamentally, you are trying to make an inherently unsafe configuration "safer", but it's impossible. Assume you have mlock'd the meta pages into memory, so the OS never flushes them itself any more, and you're running with NOSYNC. That means, within 3 transactions, the data pages on disk will be out of sync with the meta pages on disk. If the OS crashes at that point, the entire DB will be lost. The only way to make this mode of operation somewhat safe is to defer reclaiming pages for even longer. E.g., instead of halting at current_txnid - 3, halt at current_txnid - 22, in which case the data pointed to by the on-disk meta pages cannot get obsolete until 20 transactions have occurred. Note that in combination with your LIFO patch, it's pretty much guaranteed that the on-disk meta pages will be useless after only 2 un-sync'd transactions. >>> Probably could simplify this, just obtain the write mutex unconditionally, >>> then there's no need to loop or retry. But also, this depends on MDB_NOLOCK >>> - if that's set, then do no locking at all. >> >> I did so for reasons of performance and less a lock retention time. >> >> Retries will be if there an intensive flow of changes. >> In this case it will be a lot of updated pages, the record which will >> take some time. >> >> However, in subsequent iterations (if a transactions had committed >> while there was a record), >> the modified pages will be much fewer, and the sync will be quick. >> >> Thus (and it was seen in tests) even when a substantial amount of the >> transactions, >> usually only two iterations of the cycle, >> without locking and flow of changes are not suspended. -- -- Howard Chu CTO, Symas Corp. http://www.symas.com Director, Highland Sun http://highlandsun.com/hyc/ Chief Architect, OpenLDAP http://www.openldap.org/project/

1 0

Re: (ITS#7969) LMDB: Globally shared fields of meta-data are not 'volatile'.
by leo＠yuriev.ru 18 Oct '14

18 Oct '14

Hi, Howard. I also noticed a problem with the whilespace, so that is also posted patches on github's gist and gave references in each of three ITS. https://gist.github.com/leo-yuriev/dcedd5e5b2a09c605108 https://gist.github.com/leo-yuriev/fc00313b8035cddba6a7 https://gist.github.com/leo-yuriev/6f1e1444c9e29bd94885 > > Committed to mdb.master. The whitespace is wrong in all of the patches you > submitted, had to apply them manually. > > -- > -- Howard Chu > CTO, Symas Corp. http://www.symas.com > Director, Highland Sun http://highlandsun.com/hyc/ > Chief Architect, OpenLDAP http://www.openldap.org/project/

1 0

Re: (ITS#7971) LMDB: Uncarefully appointment when beginning a readonly txn.
by hyc＠symas.com 17 Oct '14

17 Oct '14

Committed to mdb.master > commit ff0bde6a95bb8ab2355324b65f451216d5ef2a0e > Author: Leo Yuriev <leo(a)yuriev.ru> > Date: 2014-10-18 02:00:37 +0400 > > ITS#7971 for LMDB: clarification in mdb_txn_renew0(). > > diff --git a/libraries/liblmdb/mdb.c b/libraries/liblmdb/mdb.c > index 83a15c1..1e766af 100644 > --- a/libraries/liblmdb/mdb.c > +++ b/libraries/liblmdb/mdb.c > @@ -2507,15 +2507,16 @@ mdb_txn_renew0(MDB_txn *txn) > UNLOCK_MUTEX_R(env); > return MDB_READERS_FULL; > } > - ti->mti_readers[i].mr_pid = pid; > - ti->mti_readers[i].mr_tid = tid; > + r = &ti->mti_readers[i]; > + r->mr_txnid = (txnid_t)-1; > + r->mr_tid = tid; > + r->mr_pid = pid; /* should be written last, see ITS#. */ > if (i == nr) > ti->mti_numreaders = ++nr; > /* Save numreaders for un-mutexed mdb_env_close() */ > env->me_numreaders = nr; > UNLOCK_MUTEX_R(env); > > - r = &ti->mti_readers[i]; > new_notls = (env->me_flags & MDB_NOTLS); > if (!new_notls && (rc=pthread_setspecific(env->me_txkey, r))) { > r->mr_pid = 0; > > > > -- -- Howard Chu CTO, Symas Corp. http://www.symas.com Director, Highland Sun http://highlandsun.com/hyc/ Chief Architect, OpenLDAP http://www.openldap.org/project/

1 0

Re: (ITS#7970) LMDB: Critical Heisenbug - Inconsistent reading & SIGSEGV due to the race condition.
by hyc＠symas.com 17 Oct '14

17 Oct '14

Committed to mdb.master > commit 6028662665c09f4d34b751bfd9b09223f6cb2433 > Author: Leo Yuriev <leo(a)yuriev.ru> > Date: 2014-10-17 23:29:56 +0400 > > ITS#7970 for LMDB: the Heisenbug. > > diff --git a/libraries/liblmdb/mdb.c b/libraries/liblmdb/mdb.c > index 3286ffb..83a15c1 100644 > --- a/libraries/liblmdb/mdb.c > +++ b/libraries/liblmdb/mdb.c > @@ -2522,7 +2522,12 @@ mdb_txn_renew0(MDB_txn *txn) > return rc; > } > } > - txn->mt_txnid = r->mr_txnid = ti->mti_txnid; > + > + do /* LY: Retry on a race, ITS#7970. */ > + r->mr_txnid = ti->mti_txnid; > + while(r->mr_txnid != ti->mti_txnid); > + > + txn->mt_txnid = r->mr_txnid; > txn->mt_u.reader = r; > meta = env->me_metas[txn->mt_txnid & 1]; > } > > > > -- -- Howard Chu CTO, Symas Corp. http://www.symas.com Director, Highland Sun http://highlandsun.com/hyc/ Chief Architect, OpenLDAP http://www.openldap.org/project/

1 0

Re: (ITS#7969) LMDB: Globally shared fields of meta-data are not 'volatile'.
by hyc＠symas.com 17 Oct '14

17 Oct '14

leo(a)yuriev.ru wrote: > The attached files is derived from OpenLDAP Software. All of the modifications > to OpenLDAP Software represented in the following patch(es) were developed by > Peter-Service LLC, Moscow, Russia. Peter-Service LLC has not assigned rights > and/or interest in this work to any party. I, Leonid Yuriev am authorized by > Peter-Service LLC, my employer, to release this work under the following terms. > > Peter-Service LLC hereby places the following modifications to OpenLDAP Software > (and only these modifications) into the public domain. Hence, these > modifications may be freely used and/or redistributed for any purpose with or > without attribution and/or other notice. Committed to mdb.master. The whitespace is wrong in all of the patches you submitted, had to apply them manually. > > commit 5e83ea8a76f81dac8d5a9dcd74d5b10a01330102 > Author: Leo Yuriev <leo(a)yuriev.ru> > Date: 2014-09-12 01:32:13 +0400 > > ITS#7969 for LMDB: volatile & __synchronize(). > > diff --git a/libraries/liblmdb/mdb.c b/libraries/liblmdb/mdb.c > index 6cc3433..3286ffb 100644 > --- a/libraries/liblmdb/mdb.c > +++ b/libraries/liblmdb/mdb.c > @@ -580,11 +580,11 @@ typedef struct MDB_rxbody { > * started from so we can avoid overwriting any data used in that > * particular version. > */ > - txnid_t mrb_txnid; > + volatile txnid_t mrb_txnid; > /** The process ID of the process owning this reader txn. */ > - MDB_PID_T mrb_pid; > + volatile MDB_PID_T mrb_pid; > /** The thread ID of the thread owning this txn. */ > - MDB_THR_T mrb_tid; > + volatile MDB_THR_T mrb_tid; > } MDB_rxbody; > > /** The actual reader record, with cacheline padding. */ > @@ -632,12 +632,12 @@ typedef struct MDB_txbody { > * This is recorded here only for convenience; the value can always > * be determined by reading the main database meta pages. > */ > - txnid_t mtb_txnid; > + volatile txnid_t mtb_txnid; > /** The number of slots that have been used in the reader table. > * This always records the maximum count, it is not decremented > * when readers release their slots. > */ > - unsigned mtb_numreaders; > + volatile unsigned mtb_numreaders; > } MDB_txbody; > > /** The actual reader table definition. */ > @@ -908,7 +908,7 @@ typedef struct MDB_meta { > /** Any persistent environment flags. @ref mdb_env */ > #define mm_flags mm_dbs[0].md_flags > pgno_t mm_last_pg; /**< last used page in file */ > - txnid_t mm_txnid; /**< txnid that committed this page */ > + volatile txnid_t mm_txnid; /**< txnid that committed this page */ > } MDB_meta; > > /** Buffer for a stack-allocated meta page. > @@ -3559,6 +3559,10 @@ mdb_env_write_meta(MDB_txn *txn) > mp->mm_dbs[0] = txn->mt_dbs[0]; > mp->mm_dbs[1] = txn->mt_dbs[1]; > mp->mm_last_pg = txn->mt_next_pgno - 1; > +#if !(defined(_MSC_VER) || defined(__i386__) || defined(__x86_64__)) > + /* LY: issue a memory barrier, if not x86. */ > + __sync_synchronize(); > +#endif > mp->mm_txnid = txn->mt_txnid; > if (!(env->me_flags & (MDB_NOMETASYNC|MDB_NOSYNC))) { > unsigned meta_size = env->me_psize; > > > > -- -- Howard Chu CTO, Symas Corp. http://www.symas.com Director, Highland Sun http://highlandsun.com/hyc/ Chief Architect, OpenLDAP http://www.openldap.org/project/

1 0

Re: (ITS#7971) LMDB: Uncarefully appointment when beginning a readonly txn.
by leo＠yuriev.ru 17 Oct '14

17 Oct '14

The attached files is derived from OpenLDAP Software. All of the modifications to OpenLDAP Software represented in the following patch(es) were developed by Peter-Service LLC, Moscow, Russia. Peter-Service LLC has not assigned rights and/or interest in this work to any party. I, Leonid Yuriev am authorized by Peter-Service LLC, my employer, to release this work under the following terms. Peter-Service LLC hereby places the following modifications to OpenLDAP Software (and only these modifications) into the public domain. Hence, these modifications may be freely used and/or redistributed for any purpose with or without attribution and/or other notice. https://gist.github.com/leo-yuriev/6f1e1444c9e29bd94885

1 0

Re: (ITS#7970) LMDB: Critical Heisenbug - Inconsistent reading & SIGSEGV due to the race condition.
by leo＠yuriev.ru 17 Oct '14

17 Oct '14

1 0

2025

2024

2023

2022

2021

2020

2019

2018

2017

2016

2015

2014

2013

2012

2011

2010

2009

2008

2007

2006

openldap-bugs October 2014