Loading freelist with large fragmented free space in LMDB

22 Oct 2020


      I have noticed a performance issue with mdb_page_alloc loading the
freelist while trying to find a large block when there is a
significant amount of free-space that is fragmented/non-contiguous.
In our application, we often have a database environment that may have
a sub-database with large blocks (thousands of records around a MB),
and another sub-database with many small blocks (10 of millions <
1KB). We frequently drop/delete the latter sub-database since it is an
index of the former, to reindex it. However, this can leave multiple
GBs of highly fragmented free space, and if we attempt to
add/insert/put a record that is several MBs, mdb_page_alloc may
attempt to load millions of free-page ids without finding a contiguous
block. Loading large freelist seems to be an O(N^2) operation (N being
size of free-list to load), since it repeatedly loads a constant size
of new pages and rescans the whole cumulate IDL to search for blocks
and merge new page ids (in mdb_midl_xmerge), and with very large free
spaces, I've seen mdb_page_alloc take multiple minutes to complete. I
think I have seen this with performing the database drop/delete as
well.
It seems like MDB_PARANOID may have been created to help deal with
this type of issue. However, the Max_retries doesn't seem to be
implemented/used by MDB_PARANOID. By adding a check to limit the retry
variable to Max_retries (as seems intended), and compiling with
MDB_PARANOID, I think this seems to mitigate this issue for us, and
limits the free-space search so mdb_page_alloc always seems to return
within a reasonable time.
I was just curious if this was the intended way to deal with large
free-spaces. And would it be helpful to (re?) implement the
Max_retries (be glad to provide a patch) for when MDB_PARANOID is
defined? Or is retry = num * 60 possibly a little bit high for a
default retry setting? Or would it be preferable to improve the
freelist loading to use a O(N*log N) algorithm (by possibly
exponentially increasing size of blocks to load with binary tree
merging) to load?
Or is it better to consider this to be a pretty pathological case, and
that that much free space (and record size variance) should be
avoided, and that dropping/deleting such a large fragmented database
should really be immediately followed by a copying the whole database
environment with compaction to eliminate the excess (or all) free
space?
Anyway, I may be misunderstanding this issue, but my local use of
MDB_PARANOID with the Max_retries fix seems like it is working fine
for me (and I can certainly copy with compaction if I need to as
well). Just curious if there are any suggestions otherwise, or any
improvements that would be helpful to LMDB.
Thanks,
Kris

2025

2024

2023

2022

2021

2020

2019

2018

2017

2016

2015

2014

2013

2012

2011

2010

2009

2008

2007

2006

Loading freelist with large fragmented free space in LMDB