LMDB proposed changes
by Howard Chu
Summarizing some discussions from IRC...
The hardcoded limit on the size of the dirty page list in a transaction is a
problem, there should not be limits on the effective size of a transaction.
The plan is to change LMDB's disk page format to include the txnID in the page
header. This way, when the dirty page list gets full we can flush it to disk
without losing track of which pages were dirtied. Then if a subsequent access
in the same txn revisits one of these pages, when we read it back from the DB
we'll know that it came from the current txn and doesn't need to be copied
again before making further modifications.
The P_DIRTY bit in the page header will no longer be needed - if the txnID
matches, the page can be used directly. If not, the page is clean and a new
page must be allocated before writing.
For WRITEMAP mode the dirty page list can be completely eliminated, the only
reason we keep it now is to know which pages' P_DIRTY bit we need to clear at
commit time.
Increasing the size of the page header by 8 bytes is a bit annoying, this will
require a full slapcat/slapadd reload of existing back-mdb databases. It would
be nice if we can avoid this but I don't see how.
--
-- Howard Chu
CTO, Symas Corp. http://www.symas.com
Director, Highland Sun http://highlandsun.com/hyc/
Chief Architect, OpenLDAP http://www.openldap.org/project/
9 years, 10 months
Re: openldap.git branch mdb.master updated. 21da623bf40dc21f89c3172c523d094075e0824b
by Hallvard Breien Furuseth
openldap-commit2devel(a)OpenLDAP.org writes:
> Allow reading freelist while working on it
>
> The circular dependency issues appear to have been resolved.
> Still, need to watch closely, maybe revert this change if
> problems arise.
IIRC this could get page-hungry when I tried that with a fragented DB,
and with large changes which needed overflow pages and used them up for
non-freelist items. One of those big LDIFs at ada, I think.
Something like: Commit wants a big enough range for an overflow page for
mt_free_pgs, maybe finds and allocates one, but the freelist changes eat
too many freelist items and will need a bigger page one later. It
doesn't free the useless overflow page it already allocated. Repeat.
Anyway, I think this change needs code to give pages back to me_pghead,
or revert to an earlier {me_pglast, me_pghead}, or something like that.
Or maybe someday clean up nested txns, support them with WRITEMAP, and
use those for the freelist loop since they support reverting a change.
--
Hallvard
10 years, 1 month