Hallvard Breien Furuseth wrote:
I think MDB v2 should move the variable parts of MDB_meta into the data pages. The datafile header would retain a word with the position of the last *synced* MDB_meta, or of the last meta when MDB_NOSYNC. The lockfile header would hold the position of the *last* MDB_meta.
Sounds to me like you want to make MDB fully multi-version. I don't see any benefit for OpenLDAP/slapd in doing that.
If that's not what you're trying to do, then you need to specify the algorithm for allocating meta pages such that versions don't accumulate endlessly.
All transactions start from the lockfile->metapos commit. Write txns do not reuse free pages younger than the datafile->metapos commit.
mdb_env_sync() called by the user does roughly: size_t lastpos = lockfile->metapos; sync; # define pos2id(env, pos) ((MDB_meta*)((env)->me_map+(pos)))->mt_txnid if (pos2id(env, lastpos) > pos2id(env, datafile->metapos)) write lastpos to &datafile->metapos; Called from mdb_txn_commit(), this may need lastpos as an argument.
Results, if I'm keeping this straight:
Setting the latest commit becomes atomic: Just change metapos. (Field MDB_txninfo.mti_txnid goes away.)
The latest commit is already atomic. mti_txnid is updated atomically in the current code.
No sync issues with copying 'MDB_db's from the meta, since the meta will not be overwritten during the txn.
There are no sync issues in the current code.
Users can sync infrequently yet preserve consistency, a generalization of MDB_NOMETASYNC. An application crash will then lose unsynced commits, since resetting the lockfile must reset lockfile->metapos. MDB cannot know if a system crash left those commits unsynced.
OK, preserving consistency is potentially a win vs what we have now. But it's also more of a crapshoot - you're only providing ACI, not D, and the application won't hear about the loss of D until long after the fact.
Some applications can probably tolerate this. But is it something we want to deal with?