On 27/05/16 23:39, juerg.bircher@gmail.com wrote:
On 27/05/16 14:42, "Hallvard Breien Furuseth" h.b.furuseth@usit.uio.no wrote:
I think MDB_WORD intends to be mdb_size_t, but should be 'unsigned long': It's probably no optimization to use 'long long' when that is not a native integer type. But I haven't profiled that.
The MDB_WORD is used to get the appropriate word size of the CPU for the optimised dbflag check. So on a 32 bit CPUs it should be an 32 bit integer and on a 64 bit CPUs a 64 bit integer.
That's why I suggested unsigned long. It seems the best guess for a native wide type. But it's hard to guess what kind of CPU is in use.
But the mdb_size_t seems always to be 64 bits?
On a 32-bit host it's normally 32-bit, but you get 64-bit mdb_size_t anyway if you build and use LMDB with MDB_VL32 defined. I misread your patch, it should not use MDB_VL32.
The delay in picking up env->me_dbflags is a bit troubling, since it means creating a new txn is no longer quite atomic. I can't think of how it'll make a difference when DBIs are used correctly, but that's one thing people screw up. Maybe it'd be OK if renew0() just copies it into mt_dbs[].md_flags without the '&', and mdb_setup_db_info() fixes it up.
Is it really atomic in original lmdb? The flags are copied from the environment and calculated. But while copying the dbflags from the environment they may change. So read level consistency and therefore atomicity is not guaranteed.
I should have said, it's atomic when DBIs are used correctly. Such as obeying this limitation from the mdb_dbi_open() doc and a similar one for mdb_dbi_close():
* A transaction that uses * this function must finish (either commit or abort) before * any other transaction in the process may use this function.
To guarante atomicity we'd need locks around some DBI operations. To avoid that, LMDB simply calls unsafe use "user errors" - which can cause breakage. Hence those annoying DBI usage restrictions.
LMDB does attempt to catch user errors when it's cheap to try, see e.g. TXN_DBI_CHANGED(). Doesn't work at all for read-only txns, they have txn->mt_dbiseqs == env->me_dbiseqs.
So you can see why DBI cleverness makes me nervous. I think the MDB_WORD looping can go in now, but we'll need to think carefully about the lazy init.
But I guess it just does not matter. So picking up the flags lazily just extends the time of copying the flags from the environment but is basically the same. Atomicity and also isolation is a must in respect to the data but I think not in respect to the database handles and their flags. Maybe I am overseeing something as I do not know all the thoughts between the lines.
With ITS#8431 I can think of one thing:
There's no sync between mdb_dbi_open() in one txn and another txn using the new DBI. It'll be up to the user to synchronize DBIs. For that to work in write-txs, TXN_DBI_CHANGED() must go away, and writers will have no way to try to catch user errors with sync. Or if it should only work in read-only txns, then read-only txns and write-txns must have different definitions of what is a "user error" with DBIs.
Anyway, we'll need to think about the definition of user errors.
Other notes:
/** ... */ above a global declaration or /**< ... */ to the right of it tells Doxygen that the comment is the doc of the declared item. "#foo" inside a Doxygen comment is a link to the declaration of foo.
Maybe mt_dbflags should have an extra byte or word at the end which does not have DB_UNUSED, and with size rounded up to sizeof(MDB_WORD). Then stepping to next DBI can do something like this, unrelated to the check against mt_numdbs. Not sure if that's an improvement, I'm getting tired:
/** Return an #MDB_WORD where each byte is \b byte */ #define WORD_BYTES(byte) ((MDB_WORD)-1 / 0xff * (byte))
/** Step up to next used DBI */ #define DBI_STEP(txn, dbi) do { \ if (! (++(dbi) & (sizeof(mdb_word_t)-1))) \ while (*(mdb_word_t *)((txn)->mt_dbflags + (dbi)) == WORD_BYTES(DB_UNUSED)) (dbi) += sizeof(MDB_WORD); \ } while (0)