openldap-commit2devel@OpenLDAP.org writes:
commit aff2693fc0721df4ccb6ceb357f80501c413ed38 Author: Howard Chu hyc@symas.com Date: Mon Dec 10 12:16:50 2012 -0800
ITS#7455 simplify Don't try to reclaim overflow pages while operating on the freelist (for now). The circular dependencies are much like the single-page case, but worse. Maybe look into this in the future, but it's not absolutely necessary now.
Suggestions to reduce freelist changes during commit:
Let a freelist entry steal page numbers listed in the next entries. Then mdb_page_alloc can grab more old pages without deleting/updating their entries and producing new dirty pages. Next txn does the updates.
Preallocate the final MDB_oldpages with MDB_RESERVE in mdb_txn_commit() and leave some room to spare. Then use page numbers from it and/or steal new ones at need.
BTW, could MDB offer an MDB_RESERVE2 which says "give me data->mv_size bytes plus as much more as will fit without growing the page"? And MDB_RESERVE2_SHRINK which shrinks the size to the final size.
Stolen pages -- one way would be to search for particular pages to seal, and list the stolen ones at the end of the freelist entry. Or: Stealing only from the end of the previous entry/entries should be simpler, but doesn't let us choose some specific pages to steal in order to gain a big enough contiguous page range: typedef struct MDB_freelist_entry { /* freelist entry in the DB */ short mf_len; /* saved length */ short mf_stolen_entries; /* #fully stolen entries */ short mf_nextlen; /* 0 or remaining length of next entry */ MDB_ID mf_pages[]; /* length mf_len. */ } MDB_freelist_entry; Thus, if the free DB contains (txnid_t)123 => { .mf_stolen_entries = 1, .mf_nextlen = 7 } (txnid_t)124 => { ... } (txnid_t)125 => { .mf_len = 20 } then mdb should henceforth skip entry#124 and entry#125.mf_pages[7..19].
A simple variant of page ranges, to save space and simplify range handling: /* Page range: (pagecount << MDB_PGNO_BITS) | (pageno + pagecount) */ typedef pgno_t mdb_pages_t;
Lone pages get pagecount=1. With MDB_PGCOUNT_BITS = (64bit 4 ? 19 : 12) and page size 4096, that limits MDB to a 128 petabyte DB and 2G entry size. Or 4G database and 16M entry size on 32-bit machines. (I'd call limiting the entry size a bonus compared to today's mdb: The current freelist doesn't exactly handle 2 billion freed pages gracefully.)