I have noticed a performance issue with mdb_page_alloc loading the
freelist while trying to find a large block when there is a
significant amount of free-space that is fragmented/non-contiguous.
In our application, we often have a database environment that may have
a sub-database with large blocks (thousands of records around a MB),
and another sub-database with many small blocks (10 of millions <
1KB). We frequently drop/delete the latter sub-database since it is an
index of the former, to reindex it. However, this can leave multiple
GBs of highly fragmented free space, and if we attempt to
add/insert/put a record that is several MBs, mdb_page_alloc may
attempt to load millions of free-page ids without finding a contiguous
block. Loading large freelist seems to be an O(N^2) operation (N being
size of free-list to load), since it repeatedly loads a constant size
of new pages and rescans the whole cumulate IDL to search for blocks
and merge new page ids (in mdb_midl_xmerge), and with very large free
spaces, I've seen mdb_page_alloc take multiple minutes to complete. I
think I have seen this with performing the database drop/delete as
well.
It seems like MDB_PARANOID may have been created to help deal with
this type of issue. However, the Max_retries doesn't seem to be
implemented/used by MDB_PARANOID. By adding a check to limit the retry
variable to Max_retries (as seems intended), and compiling with
MDB_PARANOID, I think this seems to mitigate this issue for us, and
limits the free-space search so mdb_page_alloc always seems to return
within a reasonable time.
I was just curious if this was the intended way to deal with large
free-spaces. And would it be helpful to (re?) implement the
Max_retries (be glad to provide a patch) for when MDB_PARANOID is
defined? Or is retry = num * 60 possibly a little bit high for a
default retry setting? Or would it be preferable to improve the
freelist loading to use a O(N*log N) algorithm (by possibly
exponentially increasing size of blocks to load with binary tree
merging) to load?
Or is it better to consider this to be a pretty pathological case, and
that that much free space (and record size variance) should be
avoided, and that dropping/deleting such a large fragmented database
should really be immediately followed by a copying the whole database
environment with compaction to eliminate the excess (or all) free
space?
Anyway, I may be misunderstanding this issue, but my local use of
MDB_PARANOID with the Max_retries fix seems like it is working fine
for me (and I can certainly copy with compaction if I need to as
well). Just curious if there are any suggestions otherwise, or any
improvements that would be helpful to LMDB.
Thanks,
Kris