I have noticed a performance issue with mdb_page_alloc loading the freelist while trying to find a large block when there is a significant amount of free-space that is fragmented/non-contiguous.
In our application, we often have a database environment that may have a sub-database with large blocks (thousands of records around a MB), and another sub-database with many small blocks (10 of millions < 1KB). We frequently drop/delete the latter sub-database since it is an index of the former, to reindex it. However, this can leave multiple GBs of highly fragmented free space, and if we attempt to add/insert/put a record that is several MBs, mdb_page_alloc may attempt to load millions of free-page ids without finding a contiguous block. Loading large freelist seems to be an O(N^2) operation (N being size of free-list to load), since it repeatedly loads a constant size of new pages and rescans the whole cumulate IDL to search for blocks and merge new page ids (in mdb_midl_xmerge), and with very large free spaces, I've seen mdb_page_alloc take multiple minutes to complete. I think I have seen this with performing the database drop/delete as well.
It seems like MDB_PARANOID may have been created to help deal with this type of issue. However, the Max_retries doesn't seem to be implemented/used by MDB_PARANOID. By adding a check to limit the retry variable to Max_retries (as seems intended), and compiling with MDB_PARANOID, I think this seems to mitigate this issue for us, and limits the free-space search so mdb_page_alloc always seems to return within a reasonable time.
I was just curious if this was the intended way to deal with large free-spaces. And would it be helpful to (re?) implement the Max_retries (be glad to provide a patch) for when MDB_PARANOID is defined? Or is retry = num * 60 possibly a little bit high for a default retry setting? Or would it be preferable to improve the freelist loading to use a O(N*log N) algorithm (by possibly exponentially increasing size of blocks to load with binary tree merging) to load?
Or is it better to consider this to be a pretty pathological case, and that that much free space (and record size variance) should be avoided, and that dropping/deleting such a large fragmented database should really be immediately followed by a copying the whole database environment with compaction to eliminate the excess (or all) free space?
Anyway, I may be misunderstanding this issue, but my local use of MDB_PARANOID with the Max_retries fix seems like it is working fine for me (and I can certainly copy with compaction if I need to as well). Just curious if there are any suggestions otherwise, or any improvements that would be helpful to LMDB.
Thanks, Kris
Kris Zyp wrote:
I have noticed a performance issue with mdb_page_alloc loading the freelist while trying to find a large block when there is a significant amount of free-space that is fragmented/non-contiguous.
In our application, we often have a database environment that may have a sub-database with large blocks (thousands of records around a MB), and another sub-database with many small blocks (10 of millions < 1KB). We frequently drop/delete the latter sub-database since it is an index of the former, to reindex it. However, this can leave multiple GBs of highly fragmented free space, and if we attempt to add/insert/put a record that is several MBs, mdb_page_alloc may attempt to load millions of free-page ids without finding a contiguous block. Loading large freelist seems to be an O(N^2) operation (N being size of free-list to load), since it repeatedly loads a constant size of new pages and rescans the whole cumulate IDL to search for blocks and merge new page ids (in mdb_midl_xmerge), and with very large free spaces, I've seen mdb_page_alloc take multiple minutes to complete. I think I have seen this with performing the database drop/delete as well.
It seems like MDB_PARANOID may have been created to help deal with this type of issue. However, the Max_retries doesn't seem to be implemented/used by MDB_PARANOID. By adding a check to limit the retry variable to Max_retries (as seems intended), and compiling with MDB_PARANOID, I think this seems to mitigate this issue for us, and limits the free-space search so mdb_page_alloc always seems to return within a reasonable time.
MDB_PARANOID was something we used early on when we were having problems debugging an infinite allocation search when trying to save the freelist. It just put an upper bound on the search, so that we would fallback to using new pages instead of looping forever. As the comments say, we don't need it now because that infinite looping was fixed.
Freelist fragmentation is definitely a thorny problem; should investigate something like this https://github.com/etcd-io/bbolt/pull/138 as a solution. It's on my todo list to investigate, but haven't had time yet. Patches welcome...
--On Thursday, October 22, 2020 3:48 PM +0100 Howard Chu hyc@symas.com wrote:
Freelist fragmentation is definitely a thorny problem; should investigate something like this https://github.com/etcd-io/bbolt/pull/138 as a solution. It's on my todo list to investigate, but haven't had time yet. Patches welcome...
And generally, these issues should be being discussed on either openldap-devel or openldap-technical rather than openldap-bugs.
Thanks.
--Quanah
--
Quanah Gibson-Mount Product Architect Symas Corporation Packaged, certified, and supported LDAP solutions powered by OpenLDAP: http://www.symas.com
Thank you for the info, appreciate it. And apologies for posting to the wrong list. Thanks, Kris