I have noticed a performance issue with mdb_page_alloc loading the freelist while trying to find a large block when there is a significant amount of free-space that is fragmented/non-contiguous.
In our application, we often have a database environment that may have a sub-database with large blocks (thousands of records around a MB), and another sub-database with many small blocks (10 of millions < 1KB). We frequently drop/delete the latter sub-database since it is an index of the former, to reindex it. However, this can leave multiple GBs of highly fragmented free space, and if we attempt to add/insert/put a record that is several MBs, mdb_page_alloc may attempt to load millions of free-page ids without finding a contiguous block. Loading large freelist seems to be an O(N^2) operation (N being size of free-list to load), since it repeatedly loads a constant size of new pages and rescans the whole cumulate IDL to search for blocks and merge new page ids (in mdb_midl_xmerge), and with very large free spaces, I've seen mdb_page_alloc take multiple minutes to complete. I think I have seen this with performing the database drop/delete as well.
It seems like MDB_PARANOID may have been created to help deal with this type of issue. However, the Max_retries doesn't seem to be implemented/used by MDB_PARANOID. By adding a check to limit the retry variable to Max_retries (as seems intended), and compiling with MDB_PARANOID, I think this seems to mitigate this issue for us, and limits the free-space search so mdb_page_alloc always seems to return within a reasonable time.
I was just curious if this was the intended way to deal with large free-spaces. And would it be helpful to (re?) implement the Max_retries (be glad to provide a patch) for when MDB_PARANOID is defined? Or is retry = num * 60 possibly a little bit high for a default retry setting? Or would it be preferable to improve the freelist loading to use a O(N*log N) algorithm (by possibly exponentially increasing size of blocks to load with binary tree merging) to load?
Or is it better to consider this to be a pretty pathological case, and that that much free space (and record size variance) should be avoided, and that dropping/deleting such a large fragmented database should really be immediately followed by a copying the whole database environment with compaction to eliminate the excess (or all) free space?
Anyway, I may be misunderstanding this issue, but my local use of MDB_PARANOID with the Max_retries fix seems like it is working fine for me (and I can certainly copy with compaction if I need to as well). Just curious if there are any suggestions otherwise, or any improvements that would be helpful to LMDB.
Thanks, Kris