Kris Zyp wrote:
I have noticed a performance issue with mdb_page_alloc loading the freelist while trying to find a large block when there is a significant amount of free-space that is fragmented/non-contiguous.
In our application, we often have a database environment that may have a sub-database with large blocks (thousands of records around a MB), and another sub-database with many small blocks (10 of millions < 1KB). We frequently drop/delete the latter sub-database since it is an index of the former, to reindex it. However, this can leave multiple GBs of highly fragmented free space, and if we attempt to add/insert/put a record that is several MBs, mdb_page_alloc may attempt to load millions of free-page ids without finding a contiguous block. Loading large freelist seems to be an O(N^2) operation (N being size of free-list to load), since it repeatedly loads a constant size of new pages and rescans the whole cumulate IDL to search for blocks and merge new page ids (in mdb_midl_xmerge), and with very large free spaces, I've seen mdb_page_alloc take multiple minutes to complete. I think I have seen this with performing the database drop/delete as well.
It seems like MDB_PARANOID may have been created to help deal with this type of issue. However, the Max_retries doesn't seem to be implemented/used by MDB_PARANOID. By adding a check to limit the retry variable to Max_retries (as seems intended), and compiling with MDB_PARANOID, I think this seems to mitigate this issue for us, and limits the free-space search so mdb_page_alloc always seems to return within a reasonable time.
MDB_PARANOID was something we used early on when we were having problems debugging an infinite allocation search when trying to save the freelist. It just put an upper bound on the search, so that we would fallback to using new pages instead of looping forever. As the comments say, we don't need it now because that infinite looping was fixed.
Freelist fragmentation is definitely a thorny problem; should investigate something like this https://github.com/etcd-io/bbolt/pull/138 as a solution. It's on my todo list to investigate, but haven't had time yet. Patches welcome...