Re: (ITS#8431) LMDB: Access newly opened database from another transaction - openldap-bugs

30 May 2016


      In ITS#8430, Juerg Bircher wrote:
...
So if you drop ITS#8431 it would at least be very helpful to get an
separate error like MDB_INVALID_DPI instead of EINVAL So the
application sharing and caching handles (dbi) could handle the
situation appropriate like use mdb_env_open() so that the dbi becomes
valid in that txn as well.
This is ITS#8431 in a new guise, so I'm replying here.  We know DBI
handling is inconvenient.  We'd have fixed it if it were easy without
adding a bunch of locks.  I'll walk through some of it below, I expect
it'll be useful to have for future reference anyway:
You mean like this?
if (mdb_something(txn, dbi, ...) == MDB_INVALID_DBI) {
   /* Get <dbi>, which some other txn committed after <txn> began */
   mdb_dbi_open(txn, "database name of <dbi>", 0, &dbi2);
   assert(dbi2 == dbi);
}
That breaks this requirement from the mdb_dbi_open() doc:
* A transaction that uses
* this function must finish (either commit or abort) before
* any other transaction in the process may use this function.
To fulfill that, you must serialize txns which use mdb_dbi_open().
(Unless they're all write txns, which LMDB serializes.)  One reason
you must, is that there is no sync anywhere else in your suggestion.
Multi-threaded use of structures, like the ones behind DBIs, must be
synchronized.  But DBIs and mdb_dbi_open() are not mutex-protected,
nor are read-only txns (after the per-thread setup).  So it's up to
the user.  And LMDB mostly can't catch users at less than txn-sized
threading bugs - like the DBI workarounds people keep trying.
I suppose we should add some of the above more clearly in lmdb.h doc,
since people keep trying to hack around it.  And "EINVAL/MDB_BAD_DBI
often indicate that LMDB detected a bug in your program, not that you
can safely work around the problem when LMDB returns them."
(Not always, unless we walk carefully through which error codes are
returned when first.)
For a txn to pick up a DBI committed after the txn started, LMDB
would need to mutex-protect DBI operations.  Something like this:
- Add an env+txn flag MDB_ADMIN_LOCKS.  If it is set in the env,
   then mdb_dbi_open() fails if it is not also set in the txn.
   The txn flag wraps DBI handling in a new mutex in txn_begin(),
   dbi_open(), commit/abort() of a txn which opened a DBI.
   The env flag wraps dbi_close() in the mutex.
- Add mdb_dbi_pickup(MDB_txn*, MDB_dbi of a named DB): Pick up a DBI
   which was committed after the txn began.  It will lock the mutex,
   and maybe it'll require MDB_ADMIN_LOCKS in the txn.
   (I don't know if pickup(main DBI) would be safe.)
Maybe that's more ACID-friendly - a separate function to step
   outside and look at the current env.  If I've gotten it right...
Of course, if the txn which calls dbi_pickup() began before that
   named DB was created, then the txn picks up a DBI which it cannot
   use.  That's OK as far as LMDB is concerned, it does safely catch
   use of a non-existent named DB.  But makes it less useful to users.
So I wonder, will people optimize away the mutexing in the way you
suggested above?  Try to use a DBI in a txn without MDB_ADMIN_LOCKS,
and if that fails call mdb_dbi_pickup().  If so, they reintroduce
the heisenbug: The test for whether the DBI was OK, was not mutexed
vs. mdb_txn_commit(txn which created the DBI).
Or simpler, the new flag could just tell MDB to wrap a mutex around
every single txn_begin and txn_commit.  That won't tempt anyone to
optimize away locks, since there are so much locking already.  But
then, why are they using LMDB?  LMDB has made a lot of tradeoffs to
avoid locking in read-only txns.  See Caveats in lmdb.h.
And it'd still be unsafe to allow overlapping txns to mdb_dbi_open()
the same DB.  txn1 creates the DBI, txn2 opens it too, txn1 aborts but
must not close the DBI because txn2 is using it... So MDB_dbx would
need a refcount of how many existing txns opened the DB during DBI
creation.  Also I suppose ADMIN_LOCKS would need to disable
mdb_set_compare() & co.  A new function mdb_dbi_open2() could init the
entire MDB_dbx while holding the mutex, so other txns would not use
the half-initialized DBX.
Alternatively, dbi_open2(named DB) above can take a flag which says
"make the DBI immediately available in the MDB_env".  That'll be
unsafe for txns where the user does not synchronize txn_begin against
the dbi_open2().  And it has its own sets of issues with multiple
txns opening a DB, nested txns, etc.
Then there's another approach: "Transactions are isolated from each
other, provided you use LMDB correctly.  That's all.  Sorry about
the DBI inconveniences."