Improved handling for large number of databases

26 May 2016


      Hello
Improved handling for large number of databases
===============================================
There is a increased performance penalty the more databases are created within the same environment. I was looking for a way the improve that by keeping the simplicity of tracking databases within a list with direct access by index (MDB_dbi).
mdb_dbi_open() is however not improved with the assumption that the database handle (dbi) is cached in the application. So mdb_dbi_open() should happen only once for each database during the life time of an application.
One issue is that mdb_txn_begin() (for read-only transactions) calloc the sizeof(MDB_txn) + me_maxdbs * sizeof(MDB_db + 1). The plus 1 for the dbflags. However it is sufficient only to malloc that size and clear the sizeof(MDB_txn)
memset(txn, 0, sizeof(MDB_txn)
After that the data beyond the MDB_txn is not initialized which is ok for the moment.
The next improvement happens in mdb_txn_renew0() where the dbflags are only set to DB_UNUSED (a new flag) for each database currently opened in the environment.
memset(txn->mt_dbflags, DB_UNUSED, txn->mt_numdbs);
The former code used to to loop through each database to calculate the dbflags. This is still done but lazily for each accessed database with the assumption that a read only transaction rarely uses all databases of the environment.
The lazy initialization of the dbflag happens in the macro TXN_DBI_EXIST which is always used when a database handle (dbi) is passed to an function.
The flags are updated in mdb_setup_db_info() once a database is access which is marked as unused (DB_UNUSED).
static int mdb_setup_db_info(MDB_txn *txn, MDB_dbi dbi) {
    /* Setup db info */
    uint16_t x = txn->mt_env->me_dbflags[dbi];
    txn->mt_dbs[dbi].md_flags = x & PERSISTENT_FLAGS;
    txn->mt_dbflags[dbi] = (x & MDB_VALID) ? DB_VALID|DB_USRVALID|DB_STALE : 0;
    return (txn->mt_dbflags[dbi] & validity);
}
/** Check \b txn and \b dbi arguments to a function and initialize db info if needed */
#define TXN_DBI_EXIST(txn, dbi, validity) \
    ((txn) && (dbi)<(txn)->mt_numdbs && (((txn)->mt_dbflags[dbi] & (validity)) || (((txn)->mt_dbflags[dbi] & DB_UNUSED) && mdb_setup_db_info((txn), (dbi), (validity)))))
The next improvement is done in any function which needs to loop through the databases for example in mdb_cursors_close(). Again the more databases in the environment the longer the execution time.
It should be best if looping only through dbflags and searching for those databases which are used (!DB_UNUSED). This could be done byte wise or more efficient in 8/4 byte steps comparing with an extended mask DB_UNUSED_LONG instead of DB_UNUSED. So we can skip 8 or 4 (32 bit) unused databases in one step (still with the assumption that a transaction rarely uses all databases of the environment).
So the loop looks as follows always starting at the lower index to avoid alignment issues with ARM prior v6.
#define DB_UNUSED       0x20                /**< DB not used in this txn */
#ifdef MDB_VL32
#define DB_UNUSED_LONG  0x20202020          /* DB_UNUSED long mask for fast tracking */
#else
#define DB_UNUSED_LONG  0x2020202020202020  /* DB_UNUSED long mask for fast tracking */
#endif
#ifdef MDB_VL32
#define MDB_WORD    unsigned int
#else
#define MDB_WORD    unsigned long long
#endif
MDB_dbi n = src->mt_numdbs;
    MDB_dbi i = 0;
while (1) {
        unsigned int upper = i + sizeof(MDB_WORD);
        if (upper < n) {
            // skip unused
            if ((*(MDB_WORD *)(tdbflags + i)) == DB_UNUSED_LONG) {
                i = upper;
                continue;
            }
        }
        else {
            upper = n;
        }
for (; i < upper; i++) {
            // any other filter criteria appropriate to the function
            ....
        }
        if (i >= n) {
            break;
        }
    }
Access newly opened database from another transaction
=======================================================
A transaction tracks newly opened databases and if the transaction is committed the newly opened databases are propagated to the list of open databases of the environment. However if the read only transaction is aborted the databases are not propagated.
If database handles (MDB_dbi) are cached in the application to avoid calling mdb_dbi_open() there might be the situation of two threads running a read only transaction concurrently.
First threads opens the database and commits the read-only transaction. The database is added to the list of open databases in the environment. The returned dbi is globally cached in the application.
The second thread also wants to access the same database and finds the database handle in the global application cache. The database handle however is not valid as the transaction only uses a snapshot of open databases. So the second thread gets an EINVAL when using that database handle.
This should not happen as the database is open and added to the environment.
The following should fix this issue by updating the mt_numdbs and marking the delta with DB_UNUSED
static int mdb_update_db_info(MDB_txn *txn, MDB_dbi dbi) {
    /* propagate newly opened db from env to current txn. Mark them as unused */
    if (txn->mt_numdbs < txn->mt_env->me_numdbs) {
        memset(txn->mt_dbflags + txn->mt_numdbs, DB_UNUSED, txn->mt_env->me_numdbs - txn->mt_numdbs);
    }
    txn->mt_numdbs = txn->mt_env->me_numdbs;
return dbi < txn->mt_numdbs;
}
/** Check \b txn and \b dbi arguments to a function and initialize db info if needed */
#define TXN_DBI_EXIST(txn, dbi, validity) \
    ((txn) && ((dbi)<(txn)->mt_numdbs || mdb_update_db_info((txn), (dbi))) && (((txn)->mt_dbflags[dbi] & (validity)) || (((txn)->mt_dbflags[dbi] & DB_UNUSED) && mdb_setup_db_info((txn), (dbi), (validity)))))
If interested let me know how to contribute.
Hope it is useful!
Regards
Juerg

Improved handling for large number of databases / Access newly opened database from another transaction

Improved handling for large number of databases

Access newly opened database from another transaction

Improved handling for large number of databases

Access newly opened database from another transaction

Improved handling for large number of databases

Access newly opened database from another transaction

Improved handling for large number of databases

Access newly opened database from another transaction

Improved handling for large number of databases

Access newly opened database from another transaction