https://bugs.openldap.org/show_bug.cgi?id=10236
Issue ID: 10236 Summary: fragmentation makes mdb_page_alloc slow Product: LMDB Version: 0.9.31 Hardware: All OS: All Status: UNCONFIRMED Keywords: needs_review Severity: normal Priority: --- Component: liblmdb Assignee: bugs@openldap.org Reporter: aalekseyev@janestreet.com Target Milestone: ---
Created attachment 1022 --> https://bugs.openldap.org/attachment.cgi?id=1022&action=edit patch is relative to LMDB_0.9.31
It's a known problem that mdb_page_alloc can be slow when the free list is large and fragmented. [1] [2] [3]
I'm not sure it's known *how* slow it can be.
In our workload we saw a fragmented freelist leading to a pathological O(n^2) behavior. To handle a multi-page allocation we iterate loading chunks of the free list one by one, and at every iteration we do O(n) work to check if the allocation can succeed.
Even small-ish allocations (tens of pages) are repeatedly hitting this edge case, with free list growing to ~1000000, and the outer loop taking ~2000 iterations (10^9 worth of work in total, just to allocate a few pages).
Even though I'm sure there are ways to avoid hitting this pathological scenario so much (avoid values larger than 4k, or fix whatever causes fragmentation), it seems unacceptable to have a performance cliff this bad.
I made a patch to make the allocation take ~O(n*log(n)), by loading and merging multiple chunks at once instead of doing it one-by-one.
I'd appreciate it if someone could review the patch (attached), improve it, and/or come up with an alternative fix.
The code in `midl.c` is kinda meme-y, including a contribution from GPT-4o, but it performs well enough to speed up our pathological workload by ~20x (which is still ~3x away from the non-fragmented case).
Anyway, the main thing that warrants scrutiny is the change in `mdb.c`: I understand very little about lmdb internals and I worry that loading multiple pages at once instead of one-by-one might break something.
[1] issue #8664
[2] https://lists.openldap.org/hyperkitty/list/openldap-bugs@openldap.org/thread...
[3] https://lists.openldap.org/hyperkitty/list/openldap-technical@openldap.org/m...
https://bugs.openldap.org/show_bug.cgi?id=10236
--- Comment #1 from Howard Chu hyc@openldap.org --- Thanks. The overall approach makes sense, but we cannot use GPT-authored code. The Project only accepts code submitted by its original authors, and LLM-generated code is by definition plagiarized from unknown numbers of authors.
We will look into writing a usable patch adopting this feature.
https://bugs.openldap.org/show_bug.cgi?id=10236
--- Comment #2 from aalekseyev@janestreet.com --- (In reply to Howard Chu from comment #1)
LLM-generated code is by definition plagiarized from unknown numbers of authors.
Welp, that's fair enough, although so is everything produced by my brain.
Let's try to find references with free enough license to resolve the liability of having to reimplement-but-not-plagiarize this code.
An MIT-licensed k-way-merge in Python can be seen here: https://github.com/amitrajitbose/Competitive_Programming/blob/master/Heap/me....
A heapify definition can be seen on Wikipedia (with indexes off-by-1): https://en.wikipedia.org/wiki/Binary_heap#Heap_implementation, search for "Max-Heapify".
It even mentions the optimization idea that I thought was cute in GPT-generated code:
The down-heap operation (without the preceding swap) can also be used to modify the value of the root, even when an element is not being deleted.
(I might have done pop followed by push naively)