Re: (ITS#8431) LMDB: Access newly opened database from another transaction
by h.b.furuseth@usit.uio.no
On 30. mai 2016 18:08, Juerg Bircher wrote:
> If I got it right it is all about protecting against invalid DBIs
> which can happen for example if someone closes database "A" which was
> assigned DBI "5" and later the database "B" is opened and the free
> slot 5 is then assigned to "B". So we could be misled using DBI "5"
> having in mind database "A".
>
> So why don't we extend the MDB_DBI by adding the sequence in the lower
> or higher bits. (...)
Ugh. I didn't expect all my words to miss that badly. Yes, you caught
one issue. There are more. The central point is what I wrote here:
>> Multi-threaded use of structures, like the ones behind DBIs, must be
>> synchronized. But DBIs and mdb_dbi_open() are not mutex-protected,
>> nor are read-only txns (after the per-thread setup). So it's up to
>> the user. And LMDB mostly can't catch users at less than txn-sized
>> threading bugs - like the DBI workarounds people keep trying.
I shouldn't really be the one to answer this, Howard might answer
differently if he didn't just fetch a flamethrower. But anyway:
It's about keeping a clean model of what is going on, which helps
understanding and correctly coding both LMDB itself and user programs.
When you don't work against the model, anyway.
It's about keeping LMDB's internal data structures consistent. LMDB
must know what the heck is going on when it manages and uses DBIs.
Things like ensuring that two threads do not run mdb_dbi_open() at
the same time, overwriting each others' work and creating a mess.
It's about LMDB being able to know it delivers correct results, at
least if the user doesn't work too hard at thwarting it.
This usually means serializing various parts of the code using locks
or something similar. But for the sake of speed and simplicity, LMDB
has no relevant locks. It relies on the user to lock and has some
code to catch some locking errors. One key word is "tries" - without
locks or other expensive code, it can't always catch them even when it
does try. So DBI errors are not "this DBI is broken, so drop/refresh
it" but "your program is buggy, so fix it".
And it's about living with the design choices and tradeoffs which
LMDB has made, e.g. not slowing down read-only txns for anything
like locking or extensive error checking.
So anyway, this ITS is about changes to LMDB which will not happen.
Instead it reminded me to add some checks for user errors with DBIs.
--
Hallvard
7 years, 6 months
Re: (ITS#8431) LMDB: Access newly opened database from another transaction
by juerg.bircher@gmail.com
On Mon, May 30, 2016 at 2:44 PM, Hallvard Breien Furuseth
<h.b.furuseth(a)usit.uio.no> wrote:
> In ITS#8430, Juerg Bircher wrote:
>>
>> So if you drop ITS#8431 it would at least be very helpful to get an
>> separate error like MDB_INVALID_DPI instead of EINVAL So the
>> application sharing and caching handles (dbi) could handle the
>> situation appropriate like use mdb_env_open() so that the dbi becomes
>> valid in that txn as well.
>
>
> This is ITS#8431 in a new guise, so I'm replying here. We know DBI
> handling is inconvenient. We'd have fixed it if it were easy without
> adding a bunch of locks. I'll walk through some of it below, I expect
> it'll be useful to have for future reference anyway:
>
>
> You mean like this?
>
> if (mdb_something(txn, dbi, ...) == MDB_INVALID_DBI) {
> /* Get <dbi>, which some other txn committed after <txn> began */
> mdb_dbi_open(txn, "database name of <dbi>", 0, &dbi2);
> assert(dbi2 == dbi);
> }
>
> That breaks this requirement from the mdb_dbi_open() doc:
>
> * A transaction that uses
> * this function must finish (either commit or abort) before
> * any other transaction in the process may use this function.
>
Just to illustrate the simplified sequence of calls (using MDB_NOTLS):
mdb_txn_begin(env, NULL, MDB_RDONLY, &txn2);
mdb_txn_begin(env, NULL, MDB_RDONLY, &txn);
MDB_dbi dbi;
mdb_dbi_open(txn, "db1", 0, &dbi);
// dbi becomes now valid and is propagated to the env
mdb_txn_commit(txn);
// At this point dbi for txn2 is invalid as txn2 started before
opening the database
if (mdb_something(txn2, dbi, ...) == MDB_INVALID_DBI) {
MDB_dbi dbi2;
mdb_dbi_open(txn2, "db1", 0, &dbi2);
assert(dbi2 == dbi);
// now dbi is valid for txn2 as with the mdb_dbi_open we
propagated it to txn2!
mdb_something(txn2, dbi, ...);
}
mdb_txn_commit(txn2);
This code is really not preferred and is only a workaround.
> To fulfill that, you must serialize txns which use mdb_dbi_open().
> (Unless they're all write txns, which LMDB serializes.) One reason
> you must, is that there is no sync anywhere else in your suggestion.
>
> Multi-threaded use of structures, like the ones behind DBIs, must be
> synchronized. But DBIs and mdb_dbi_open() are not mutex-protected,
> nor are read-only txns (after the per-thread setup). So it's up to
> the user. And LMDB mostly can't catch users at less than txn-sized
> threading bugs - like the DBI workarounds people keep trying.
>
> I suppose we should add some of the above more clearly in lmdb.h doc,
> since people keep trying to hack around it. And "EINVAL/MDB_BAD_DBI
> often indicate that LMDB detected a bug in your program, not that you
> can safely work around the problem when LMDB returns them."
> (Not always, unless we walk carefully through which error codes are
> returned when first.)
>
>
> For a txn to pick up a DBI committed after the txn started, LMDB
> would need to mutex-protect DBI operations. Something like this:
>
> - Add an env+txn flag MDB_ADMIN_LOCKS. If it is set in the env,
> then mdb_dbi_open() fails if it is not also set in the txn.
> The txn flag wraps DBI handling in a new mutex in txn_begin(),
> dbi_open(), commit/abort() of a txn which opened a DBI.
> The env flag wraps dbi_close() in the mutex.
>
> - Add mdb_dbi_pickup(MDB_txn*, MDB_dbi of a named DB): Pick up a DBI
> which was committed after the txn began. It will lock the mutex,
> and maybe it'll require MDB_ADMIN_LOCKS in the txn.
> (I don't know if pickup(main DBI) would be safe.)
>
> Maybe that's more ACID-friendly - a separate function to step
> outside and look at the current env. If I've gotten it right...
>
> Of course, if the txn which calls dbi_pickup() began before that
> named DB was created, then the txn picks up a DBI which it cannot
> use. That's OK as far as LMDB is concerned, it does safely catch
> use of a non-existent named DB. But makes it less useful to users.
>
> So I wonder, will people optimize away the mutexing in the way you
> suggested above? Try to use a DBI in a txn without MDB_ADMIN_LOCKS,
> and if that fails call mdb_dbi_pickup(). If so, they reintroduce
> the heisenbug: The test for whether the DBI was OK, was not mutexed
> vs. mdb_txn_commit(txn which created the DBI).
>
> Or simpler, the new flag could just tell MDB to wrap a mutex around
> every single txn_begin and txn_commit. That won't tempt anyone to
> optimize away locks, since there are so much locking already. But
> then, why are they using LMDB? LMDB has made a lot of tradeoffs to
> avoid locking in read-only txns. See Caveats in lmdb.h.
>
>
> And it'd still be unsafe to allow overlapping txns to mdb_dbi_open()
> the same DB. txn1 creates the DBI, txn2 opens it too, txn1 aborts but
> must not close the DBI because txn2 is using it... So MDB_dbx would
> need a refcount of how many existing txns opened the DB during DBI
> creation. Also I suppose ADMIN_LOCKS would need to disable
> mdb_set_compare() & co. A new function mdb_dbi_open2() could init the
> entire MDB_dbx while holding the mutex, so other txns would not use
> the half-initialized DBX.
>
> Alternatively, dbi_open2(named DB) above can take a flag which says
> "make the DBI immediately available in the MDB_env". That'll be
> unsafe for txns where the user does not synchronize txn_begin against
> the dbi_open2(). And it has its own sets of issues with multiple
> txns opening a DB, nested txns, etc.
>
>
> Then there's another approach: "Transactions are isolated from each
> other, provided you use LMDB correctly. That's all. Sorry about
> the DBI inconveniences."
>
If I got it right it is all about protecting against invalid DBIs
which can happen for example if someone closes database "A" which was
assigned DBI "5" and later the database "B" is opened and the free
slot 5 is then assigned to "B". So we could be misled using DBI "5"
having in mind database "A".
So why don't we extend the MDB_DBI by adding the sequence in the lower
or higher bits. When addressing the slot we simply strip the sequence
bits and verify the slot's sequence against the sequence in the
MDB_DBI to ensure the DBI has not changed. So picking up a DPI (as
suggested in the patch) can be verified.
The restriction with mdb_dbi_open() will stay. But the following would work:
MDB_dbi dbi;
mdb_txn_begin(env, NULL, MDB_RDONLY, &txn2);
mdb_txn_begin(env, NULL, MDB_RDONLY, &txn1);
mdb_dbi_open(txn1, "db1", 0, &dbi);
// dbi becomes now valid and is propagated to the env
// ==> and therefore is also valid for any other transaction!
mdb_txn_commit(txn1);
// use dbi opened in txn1 (which is committed)
mdb_something(txn2, dbi, ...);
....
mdb_txn_commit(txn2);
So the patch could look something like that.
static int mdb_update_db_info(MDB_txn *txn, MDB_dbi dbi) {
unsigned int dbi_index = dbi & MDB_DBI_INDXE_MASK;
unsigned int dbi_seq = dbi & MDB_DBI_SEQ_MASK << 16;
if (txn->mt_env->me_dbiseqs[dbi_index] != dbi_seq) {
return 0;
}
/* propagate newly opened db from env to current txn. Mark them as unused */
if (txn->mt_numdbs < txn->mt_env->me_numdbs) {
memset(txn->mt_dbflags + txn->mt_numdbs, DB_UNUSED,
txn->mt_env->me_numdbs - txn->mt_numdbs);
}
txn->mt_numdbs = txn->mt_env->me_numdbs;
return dbi_index < txn->mt_numdbs;
}
7 years, 6 months
Re: (ITS#8430) Improved handling for large number of databases
by h.b.furuseth@usit.uio.no
On 30. mai 2016 13:51, Hallvard Breien Furuseth wrote:
> Copied from a private mail:
>> Maybe a good way in between could be to memcpy txn->mt_env->me_dbflags.
>> (...) memcpy uses SSE instructions for x86_64.
>
> Yes. If we make MDB_txn.mt_dbflags an uint16_t*, we can memcpy to it.
> That costs <maxdbs> bytes/txn + one more reserved env flag bit in
> PERSISTENT_FLAGS: Reserve DB_UNUSED, and keep it always set in
> me_dbflags so the memcpy will set it in mt_dbflags.
Oops, that was too clever. Then a bunch of code which assumes a DBI
is valid if various txn flags are set, must also check DB_UNUSED.
Or invert that to DB_USED which might be easier - (flags & DB_DIRTY)
becomes F_ISSET(flags, DB_DIRTY|DB_USED) which checks if both are set.
Maybe the simple solution using 2*<maxdbs> bytes/txn would be better.
--
Hallvard
7 years, 6 months
Re: (ITS#8431) LMDB: Access newly opened database from another transaction
by h.b.furuseth@usit.uio.no
In ITS#8430, Juerg Bircher wrote:
> So if you drop ITS#8431 it would at least be very helpful to get an
> separate error like MDB_INVALID_DPI instead of EINVAL So the
> application sharing and caching handles (dbi) could handle the
> situation appropriate like use mdb_env_open() so that the dbi becomes
> valid in that txn as well.
This is ITS#8431 in a new guise, so I'm replying here. We know DBI
handling is inconvenient. We'd have fixed it if it were easy without
adding a bunch of locks. I'll walk through some of it below, I expect
it'll be useful to have for future reference anyway:
You mean like this?
if (mdb_something(txn, dbi, ...) == MDB_INVALID_DBI) {
/* Get <dbi>, which some other txn committed after <txn> began */
mdb_dbi_open(txn, "database name of <dbi>", 0, &dbi2);
assert(dbi2 == dbi);
}
That breaks this requirement from the mdb_dbi_open() doc:
* A transaction that uses
* this function must finish (either commit or abort) before
* any other transaction in the process may use this function.
To fulfill that, you must serialize txns which use mdb_dbi_open().
(Unless they're all write txns, which LMDB serializes.) One reason
you must, is that there is no sync anywhere else in your suggestion.
Multi-threaded use of structures, like the ones behind DBIs, must be
synchronized. But DBIs and mdb_dbi_open() are not mutex-protected,
nor are read-only txns (after the per-thread setup). So it's up to
the user. And LMDB mostly can't catch users at less than txn-sized
threading bugs - like the DBI workarounds people keep trying.
I suppose we should add some of the above more clearly in lmdb.h doc,
since people keep trying to hack around it. And "EINVAL/MDB_BAD_DBI
often indicate that LMDB detected a bug in your program, not that you
can safely work around the problem when LMDB returns them."
(Not always, unless we walk carefully through which error codes are
returned when first.)
For a txn to pick up a DBI committed after the txn started, LMDB
would need to mutex-protect DBI operations. Something like this:
- Add an env+txn flag MDB_ADMIN_LOCKS. If it is set in the env,
then mdb_dbi_open() fails if it is not also set in the txn.
The txn flag wraps DBI handling in a new mutex in txn_begin(),
dbi_open(), commit/abort() of a txn which opened a DBI.
The env flag wraps dbi_close() in the mutex.
- Add mdb_dbi_pickup(MDB_txn*, MDB_dbi of a named DB): Pick up a DBI
which was committed after the txn began. It will lock the mutex,
and maybe it'll require MDB_ADMIN_LOCKS in the txn.
(I don't know if pickup(main DBI) would be safe.)
Maybe that's more ACID-friendly - a separate function to step
outside and look at the current env. If I've gotten it right...
Of course, if the txn which calls dbi_pickup() began before that
named DB was created, then the txn picks up a DBI which it cannot
use. That's OK as far as LMDB is concerned, it does safely catch
use of a non-existent named DB. But makes it less useful to users.
So I wonder, will people optimize away the mutexing in the way you
suggested above? Try to use a DBI in a txn without MDB_ADMIN_LOCKS,
and if that fails call mdb_dbi_pickup(). If so, they reintroduce
the heisenbug: The test for whether the DBI was OK, was not mutexed
vs. mdb_txn_commit(txn which created the DBI).
Or simpler, the new flag could just tell MDB to wrap a mutex around
every single txn_begin and txn_commit. That won't tempt anyone to
optimize away locks, since there are so much locking already. But
then, why are they using LMDB? LMDB has made a lot of tradeoffs to
avoid locking in read-only txns. See Caveats in lmdb.h.
And it'd still be unsafe to allow overlapping txns to mdb_dbi_open()
the same DB. txn1 creates the DBI, txn2 opens it too, txn1 aborts but
must not close the DBI because txn2 is using it... So MDB_dbx would
need a refcount of how many existing txns opened the DB during DBI
creation. Also I suppose ADMIN_LOCKS would need to disable
mdb_set_compare() & co. A new function mdb_dbi_open2() could init the
entire MDB_dbx while holding the mutex, so other txns would not use
the half-initialized DBX.
Alternatively, dbi_open2(named DB) above can take a flag which says
"make the DBI immediately available in the MDB_env". That'll be
unsafe for txns where the user does not synchronize txn_begin against
the dbi_open2(). And it has its own sets of issues with multiple
txns opening a DB, nested txns, etc.
Then there's another approach: "Transactions are isolated from each
other, provided you use LMDB correctly. That's all. Sorry about
the DBI inconveniences."
7 years, 6 months
Re: (ITS#8430) Improved handling for large number of databases
by h.b.furuseth@usit.uio.no
On 28/05/16 22:08, Juerg Bircher wrote:
> I understand that it is a tricky thing. But maybe the lazy init is
> still feasible? (...)
Right, lazy init is feasible. Just keep cross-txn DBI stuff out of
it, I'll take your latest variant to ITS#8431.
Copied from a private mail:
> Maybe a good way in between could be to memcpy txn->mt_env->me_dbflags.
> (...) memcpy uses SSE instructions for x86_64.
Yes. If we make MDB_txn.mt_dbflags an uint16_t*, we can memcpy to it.
That costs <maxdbs> bytes/txn + one more reserved env flag bit in
PERSISTENT_FLAGS: Reserve DB_UNUSED, and keep it always set in
me_dbflags so the memcpy will set it in mt_dbflags.
Or maybe your code can be used without initial memcpy and I'm just
paranoid. The one problem I can think of is user errors - people
coming up with workarounds like your cross-txn DBI stuff. LMDB tries
to catch some of that, but maybe late pickup defeats some such checks.
I'll wait for Howard to say something at this point.
BTW, all this reminds me - I should add a test which tries to catch
mdb_dbi_open() in overlapping txns. Got the code, but forgot about
it while pondering a good error code name.
> However md_flags would need to be moved out of the MDB_dbi struct
Nope. Sub-DBs with duplicate data have md_flags which do not
correspond to anything saved in env->me_flags. Your lazy-init code
needs to use the memcpyed flags instead of me_dbflags, that's all.
> We should consider to align on 16 byte boundary for best performance.
It's already aligned.
7 years, 6 months
Re: (ITS#8431) LMDB: Access newly opened database from another transaction
by juerg.bircher@gmail.com
On Sat, May 28, 2016 at 11:18 AM, Howard Chu <hyc(a)symas.com> wrote:
> juerg.bircher(a)gmail.com wrote:
>>
>> On 27/05/16 15:11, "Hallvard Breien Furuseth" <h.b.furuseth(a)usit.uio.no>
>> wrote:
>>
>>> This works as intended. Each transaction is atomic, and shall see
>>> neither DBIs nor data which were committed after the transaction began.
>>> Cross-txn mdb_dbi_open() is in any case hairy to get right, we screwed
>>> it up several times before arriving at the now-obvious semantics.
>>>
>>
>> It is fine that newly opened database handles shall not be seen by
>> other transactions as long the transaction runs.. But once the
>> transaction which opened the handle is committed it should be valid
>> for any other transaction.
>
>
> It should be valid for any other transaction *that started after the opening
> transaction committed.* This is the nature of transactions, they are fully
> isolated from each other.
>
>> This would greatly simplify lazy database
>> opening of a multithreaded application.
>
>
> Allowing existing transactions to use the DBIs would break transaction
> isolation. That is a non-starter.
>
>> The suggested patch does exactly implement that behaviour. I believe
>> it does not break anything. Isolation must be guaranteed for the data
>> but not necessarily for the database handles.
>
>
> Wrong. A database handle essentially points to data. Exposing that data to a
> transaction that was opened before that handle's transaction was committed
> *does* break isolation. Violating the ACID guarantees of LMDB's transaction
> system is Not allowed.
>>
>>
>>> An application can open and cache its DBIs in a separate startup
>>> transaction, then commit it and start new transactions.
>>
>>
>> Yes this would be a simple approach to get around that problem. But I
>> do not like it as with each opened database the overhead increases
>> especially if not using my suggested improvements in ITS#8430.
>
>
> You are saying you don't like ACID transactions. If that's the case, LMDB is
> not for you.
Not at all ACID is a must and something I very much appreciate about
LMDB! LMDB is definitely for me and I'm saying that after having spent
nearly one intensive year with LMDB!
So I like to thank you for the outstanding database.
>
> --
> -- Howard Chu
> CTO, Symas Corp. http://www.symas.com
> Director, Highland Sun http://highlandsun.com/hyc/
> Chief Architect, OpenLDAP http://www.openldap.org/project/
7 years, 6 months
Re: (ITS#8430) Improved handling for large number of databases
by juerg.bircher@gmail.com
---------- Forwarded message ----------
From: Juerg Bircher <juerg.bircher(a)gmail.com>
Date: Sat, May 28, 2016 at 10:30 PM
Subject: Re: (ITS#8430) Improved handling for large number of databases
To: Hallvard Breien Furuseth <h.b.furuseth(a)usit.uio.no>
On Sat, May 28, 2016 at 3:41 AM, Hallvard Breien Furuseth
<h.b.furuseth(a)usit.uio.no> wrote:
> On 28/05/16 03:28, Hallvard Breien Furuseth wrote:
>>
>> On 27/05/16 23:39, juerg.bircher(a)gmail.com wrote:
>>>
>>> Maybe I am overseeing something as I do not know all the thoughts
>>> between the lines.
>>
>>
>> With ITS#8431 I can think of one thing:
>> (...blah...)
>
>
> Actually the main reason I worry about lazy init is simply that
> we've screwed up DBIs many times, so I prefer to go slowly.
> That's Howard's call though, not mine.
Maybe a good way in between could be to memcpy
txn->mt_env->me_dbflags. However md_flags would need to be moved out
of the MDB_dbi struct like mt_dbflags or easier just add another array
serving as snapshot. This would not break anything with flag
initialization as we get a snapshot of them in the mdb_txn_renew0.
memcpy uses SSE instructions for x86_64. We should consider to align
on 16 byte boundary for best performance.
The dbflags could then still be initialized with MDB_UNSUSED as suggested.
7 years, 6 months
Re: (ITS#8430) Improved handling for large number of databases
by juerg.bircher@gmail.com
On Sat, May 28, 2016 at 3:28 AM, Hallvard Breien Furuseth
<h.b.furuseth(a)usit.uio.no> wrote:
> On 27/05/16 23:39, juerg.bircher(a)gmail.com wrote:
>>
>> On 27/05/16 14:42, "Hallvard Breien Furuseth" <h.b.furuseth(a)usit.uio.no>
>> wrote:
>>>
>>> I think MDB_WORD intends to be mdb_size_t, but should be 'unsigned
>>> long': It's probably no optimization to use 'long long' when that
>>> is not a native integer type. But I haven't profiled that.
>>
>>
>> The MDB_WORD is used to get the appropriate word size of the CPU for
>> the optimised dbflag check. So on a 32 bit CPUs it should be an 32 bit
>> integer and on a 64 bit CPUs a 64 bit integer.
>
>
> That's why I suggested unsigned long. It seems the best guess for a
> native wide type. But it's hard to guess what kind of CPU is in use.
>
>> But the mdb_size_t seems always to be 64 bits?
>
>
> On a 32-bit host it's normally 32-bit, but you get 64-bit mdb_size_t anyway
> if you build and use LMDB with MDB_VL32 defined. I misread
> your patch, it should not use MDB_VL32.
>
>>> The delay in picking up env->me_dbflags is a bit troubling, since it
>>> means creating a new txn is no longer quite atomic. I can't think of
>>> how it'll make a difference when DBIs are used correctly, but that's one
>>> thing people screw up. Maybe it'd be OK if renew0() just copies it into
>>> mt_dbs[].md_flags without the '&', and mdb_setup_db_info() fixes it up.
>>
>>
>> Is it really atomic in original lmdb? The flags are copied from the
>> environment and calculated. But while copying the dbflags from the
>> environment they may change. So read level consistency and therefore
>> atomicity is not guaranteed.
>
>
> I should have said, it's atomic when DBIs are used correctly. Such as
> obeying this limitation from the mdb_dbi_open() doc and a similar one
> for mdb_dbi_close():
>
> * A transaction that uses
> * this function must finish (either commit or abort) before
> * any other transaction in the process may use this function.
>
> To guarante atomicity we'd need locks around some DBI operations.
> To avoid that, LMDB simply calls unsafe use "user errors" - which can
> cause breakage. Hence those annoying DBI usage restrictions.
>
> LMDB does attempt to catch user errors when it's cheap to try, see e.g.
> TXN_DBI_CHANGED(). Doesn't work at all for read-only txns, they have
> txn->mt_dbiseqs == env->me_dbiseqs.
>
> So you can see why DBI cleverness makes me nervous.
> I think the MDB_WORD looping can go in now, but we'll need to think
> carefully about the lazy init.
I understand that it is a tricky thing. But maybe the lazy init is
still feasible? In my understand without having all the background it
seems ok to get the current snapshot of the database flags from the
env once it is firstly accessed whether it is at the beginning or
later in the transaction.
Read level consistency should be preserved anyway as mt_txnId is set
in mdb_txn_renew0().
>
>> But I guess it just does not matter.
>> So picking up the flags lazily just extends the time of copying the
>> flags from the environment but is basically the same.
>> Atomicity and also isolation is a must in respect to the data but I
>> think not in respect to the database handles and their flags.
>> Maybe I am overseeing something as I do not know all the thoughts
>> between the lines.
>
>
> With ITS#8431 I can think of one thing:
>
> There's no sync between mdb_dbi_open() in one txn and another txn
> using the new DBI. It'll be up to the user to synchronize DBIs. For
> that to work in write-txs, TXN_DBI_CHANGED() must go away, and writers
> will have no way to try to catch user errors with sync. Or if it
> should only work in read-only txns, then read-only txns and write-txns
> must have different definitions of what is a "user error" with DBIs.
>
> Anyway, we'll need to think about the definition of user errors.
>
So if you drop ITS#8431 it would at least be very helpful to get an
separate error like MDB_INVALID_DPI instead of EINVAL So the
application sharing and caching handles (dbi) could handle the
situation appropriate like use mdb_env_open() so that the dbi becomes
valid in that txn as well.
>
> Other notes:
>
> /** ... */ above a global declaration or /**< ... */ to the right
> of it tells Doxygen that the comment is the doc of the declared item.
> "#foo" inside a Doxygen comment is a link to the declaration of foo.
>
>
> Maybe mt_dbflags should have an extra byte or word at the end which
> does not have DB_UNUSED, and with size rounded up to sizeof(MDB_WORD).
> Then stepping to next DBI can do something like this, unrelated to
> the check against mt_numdbs. Not sure if that's an improvement, I'm
> getting tired:
>
> /** Return an #MDB_WORD where each byte is \b byte */
> #define WORD_BYTES(byte) ((MDB_WORD)-1 / 0xff * (byte))
>
> /** Step up to next used DBI */
> #define DBI_STEP(txn, dbi) do { \
> if (! (++(dbi) & (sizeof(mdb_word_t)-1))) \
> while (*(mdb_word_t *)((txn)->mt_dbflags + (dbi)) ==
> WORD_BYTES(DB_UNUSED))
> (dbi) += sizeof(MDB_WORD); \
> } while (0)
>
Yes round it up to sizeof(MDB_WORD) would be obvious in connection
with the optimized loop. If you are willing to add some more #ifdef we
could even get it faster at least for x86_64 using SSE2 instructions.
Then MDB_WORD would be 128 bit and we check 16 flags in one step using
_mm_cmpeq_epi8 and _mm_movemask_epi8 to get the mask of matching
flags. If the mask != 0xffffffff we found an used database. Then we
negate the mask and with __builtin_ffs(mask) we could then find the
index of the first used database.
7 years, 6 months
(ITS#8102) syncrepl_entry: be_modify failed (16)
by tpretz@gmail.com
Hi,
Following up on this ITS i opened a while back.
With multi-master, normal syncrepl, i would sometimes receive:
slapd: null_callback : error code 0x10
slapd: syncrepl_entry: rid=106 be_modify failed (16)
Triggering a syncrepl connection drop/retry, whilst playing the
sessionlog when a server with multiple providers was started.
I am now testing with 2.4.44 and have had a chance to look at this
annoying, but seemingly not destructive issue in some more detail.
As i partially referenced previously, this occurs within
syncrepl_entry, for modifications, a diff of old_entry to new_entry is
performed. Then if changes are needed a be_modify is performed.
There is however, no locking which prevents two, or more, threads from
performing these diffs, and then mods, in an interleaved fashion
within this function itself.
Looking in do_syncrep2, if the cookie tag is present the cs_pmutex is
acquired and held for the duration of modifications. This mutex
protects from syncrepl_entry race conditions and serializes
modifications.
I have also noticed this issue during normal operations (ie all
syncrepl in persist) when out of order writes are occurring on a
master which are relatively easy to reproduce on an hdb backend
server.
When a cookie is not sent with an entry the cs_pmutex is not acquired.
Without having some protection, non-cookie modifications will race
each other between syncrepl threads.
So, i am testing surrounding the syncrepl_entry "if" block (line 1036)
with a cs_pmutex lock/release (when punlock < 0) to serialize
non_cookie mods just like the cookie ones.
So far this is running tests and i haven't seen the null_callback
issue, either when catching up from the session log, or running with
ongoing out of order writes being replicated (running alongside
unmodified 2.4.44 to compare differences).
When acquiring the cs_pmutex i have used the same logic as at line 958
(using trylock, with a shutdown check). I wonder if it is safe to
acquire the mutex with a standard ldap_pvt_thread_mutex_lock at this
point without spinning.
line numbers from RELENG_2_4 (721a038b7bc9732f52eeef5324c180c4f137cd75)
Thanks
Tom
7 years, 6 months
Re: (ITS#8431) LMDB: Access newly opened database from another transaction
by hyc@symas.com
juerg.bircher(a)gmail.com wrote:
> On 27/05/16 15:11, "Hallvard Breien Furuseth" <h.b.furuseth(a)usit.uio.no> wrote:
>
>> This works as intended. Each transaction is atomic, and shall see
>> neither DBIs nor data which were committed after the transaction began.
>> Cross-txn mdb_dbi_open() is in any case hairy to get right, we screwed
>> it up several times before arriving at the now-obvious semantics.
>>
>
> It is fine that newly opened database handles shall not be seen by
> other transactions as long the transaction runs.. But once the
> transaction which opened the handle is committed it should be valid
> for any other transaction.
It should be valid for any other transaction *that started after the opening
transaction committed.* This is the nature of transactions, they are fully
isolated from each other.
> This would greatly simplify lazy database
> opening of a multithreaded application.
Allowing existing transactions to use the DBIs would break transaction
isolation. That is a non-starter.
> The suggested patch does exactly implement that behaviour. I believe
> it does not break anything. Isolation must be guaranteed for the data
> but not necessarily for the database handles.
Wrong. A database handle essentially points to data. Exposing that data to a
transaction that was opened before that handle's transaction was committed
*does* break isolation. Violating the ACID guarantees of LMDB's transaction
system is Not allowed.
>
>> An application can open and cache its DBIs in a separate startup
>> transaction, then commit it and start new transactions.
>
> Yes this would be a simple approach to get around that problem. But I
> do not like it as with each opened database the overhead increases
> especially if not using my suggested improvements in ITS#8430.
You are saying you don't like ACID transactions. If that's the case, LMDB is
not for you.
--
-- Howard Chu
CTO, Symas Corp. http://www.symas.com
Director, Highland Sun http://highlandsun.com/hyc/
Chief Architect, OpenLDAP http://www.openldap.org/project/
7 years, 6 months