https://bugs.openldap.org/show_bug.cgi?id=9735
Issue ID: 9735 Summary: [PATCH] try hard to find free space if database cannot grow Product: LMDB Version: 0.9.24 Hardware: All OS: Linux Status: UNCONFIRMED Keywords: needs_review Severity: normal Priority: --- Component: liblmdb Assignee: bugs@openldap.org Reporter: libor.peltan@nic.cz Target Milestone: ---
Created attachment 851 --> https://bugs.openldap.org/attachment.cgi?id=851&action=edit Patch fixing the issue "try hard to find free space if database cannot grow"
Note: - the issue is the same in version 0.9.70 (git)
Situation: - the database had already grown to its limit (mapsize) in the past - overflow pages are used heavily as stored values are usually several pages long - free space got fragmented
Problem: - attempt to insert new value results in MDB_MAP_FULL despite there is free space available
Cause: there is a heursitic in mdb_page_alloc() that gives up searching for free space chunk if this would take too much time. This is useful when the database can still grow, as it balances performance with space usage. However, if the database can no longer grow, it prevents inserting new values.
Solution: detect early on in mdb_page_alloc() if the database can grow, and if not, let it try hard to search for free space.
Patch: attached
https://bugs.openldap.org/show_bug.cgi?id=9735
Quanah Gibson-Mount quanah@openldap.org changed:
What |Removed |Added ---------------------------------------------------------------------------- Assignee|bugs@openldap.org |hyc@openldap.org Keywords|needs_review |
https://bugs.openldap.org/show_bug.cgi?id=9735
--- Comment #1 from libor.peltan@nic.cz --- BUMP, could you please look at this issue? It still hurts us. Thank you.
https://bugs.openldap.org/show_bug.cgi?id=9735
--- Comment #2 from jtgrassie jtg@xtrabass.com --- I wonder if this "try harder" approach is best activated via a flag on mdb_env_open? That way, only users who want the more aggressive search for free space get impacted.
https://bugs.openldap.org/show_bug.cgi?id=9735
--- Comment #3 from libor.peltan@nic.cz --- (In reply to jtgrassie from comment #2)
I wonder if this "try harder" approach is best activated via a flag on mdb_env_open? That way, only users who want the more aggressive search for free space get impacted.
I thought about it as well, but then I count'd find any use-case for the other (current) behavior. When the database can still grow, there is a clear trade-off between performance and occupied storage. But otherwise, should there be a trade-off between performance and (non-)functionality? Could you please think of an example where is it more desirable to return a failure than to continue trying hard (to find some free space)? Thank you!
https://bugs.openldap.org/show_bug.cgi?id=9735
--- Comment #4 from jtgrassie jtg@xtrabass.com ---
I thought about it as well, but then I count'd find any use-case for the other (current) behavior. When the database can still grow, there is a clear trade-off between performance and occupied storage. But otherwise, should there be a trade-off between performance and (non-)functionality? Could you please think of an example where is it more desirable to return a failure than to continue trying hard (to find some free space)? Thank you!
The trade-off isn't really between performance and (non-)functionality, rather about performance vs growth rate concerns. This is because you either do a quick freelist search, and if not found, you can grow the environment (which is fast; the current solve), vs doing an aggressive (slow) fuller search (this proposed patch), before potentially growing.
Hence if you don't really care about growth rate or are happy enough to occasionally do some maintenance when you do care (i.e. performing an `mdb_copy -c ...` to compact out the freelist), then the current approach is fine. But if you do care more about growth rate than performance, something like this patch is useful, because you're saying "hey, I don't care you might take a while looking through the freelist for free pages, I'd prefer that to growing unnecessarily".
This is why I think this is a good candidate for an env open flag, so the user can decide which matters more to them.
https://bugs.openldap.org/show_bug.cgi?id=9735
--- Comment #5 from libor.peltan@nic.cz --- @4jtgrassie
This is a misunderstanding.
My proposed patch only performs an aggressive (slow) fuller search if the database can NOT grow, since it had already grown to its limit (mapsize).
Hence, it's really a trade-off between performance and functionality: with the patch, the database tries hard to find any available free space; without the patch (or with it deactivated by some env flag), it tries a little and than fails with MDB_MAP_FULL. Can this be desirable in some use-case?
When the database is not full and the environment has the possibility to grow, my patch does not change anything. The performance/size trade-off coefficient remains unchanged. (I believe it has been carefully chosen based on experience, and it's possible that some users might want to configure it differently, but I'd like to avoid this topic completely. It would be too arguable.)
https://bugs.openldap.org/show_bug.cgi?id=9735
--- Comment #6 from jtgrassie jtg@xtrabass.com ---
This is a misunderstanding.
My proposed patch only performs an aggressive (slow) fuller search if the database can NOT grow, since it had already grown to its limit (mapsize).
No, I do understand this. The issue is all about determining *if* there is any free space (if the environment really is full or not).
Hence, it's really a trade-off between performance and functionality: with the patch, the database tries hard to find any available free space; without the patch (or with it deactivated by some env flag), it tries a little and than fails with MDB_MAP_FULL. Can this be desirable in some use-case?
It's not a "functionality" trade-off because as I mentioned, one can resize (grow) the environment, which is fast (`mdb_env_set_mapsize`). Hence the trade-off is performance vs growth rate concerns.
https://bugs.openldap.org/show_bug.cgi?id=9735
--- Comment #7 from libor.peltan@nic.cz --- @jtgrassie Well, for the sake of clarity, let's distinguish:
- growth(1): increasing the LMDB data file size within the limit of pre-configured (and static) Mapsize - growth(2): increasing the Mapsize
Do you suggest that when the free space in given env is not found (e.g. due to free space fragmentation), it's desirable to grow(2) ? I guess the code of LMDB internally never does this, but it's of course possible to implement in the application some logic like:
- try to commit a RW txn - if MDB_MAP_FULL is encountered, grow(2) and re-try
...but there would always need to be _some_ limit (e.g. available disk space). Could you discuss why wouldn't one configure the Mapsize to _that_ very maximal limit already at the beginning and rely only on growth(1) which is much less complicated?
From my point of view, the situation looks quite simple:
- set the Mapsize to the highest possible limit - the DB will probably never reach it, enjoying the performance-size-trade-off when searching for available free chunk - when the fragmentation goes so bad that it actually reaches the limit, it's always better to sacrifice the performance than failing to write data (with no possibility of any growth)
But if you could sketch different reasonable operation modes, i'd be glad :)
https://bugs.openldap.org/show_bug.cgi?id=9735
--- Comment #8 from jtgrassie jtg@xtrabass.com ---
- growth(1): increasing the LMDB data file size within the limit of
pre-configured (and static) Mapsize
This is not part of the discussion, LMDB already does this. The question is about how hard to look for free space once you've grown to the bounds.
- growth(2): increasing the Mapsize
This is an option when you get a MDB_MAP_FULL from a write tx. As mentioned, using `mdb_env_set_mapsize`.
Do you suggest that when the free space in given env is not found (e.g. due to free space fragmentation), it's desirable to grow(2) ? I guess the code of LMDB internally never does this, but it's of course possible to implement in the application some logic like:
- try to commit a RW txn
- if MDB_MAP_FULL is encountered, grow(2) and re-try
That's exactly the pattern I described, and is used by various applications.
Could you discuss why wouldn't one configure the Mapsize to _that_ very maximal limit already at the beginning
There are many reasons why one wouldn't set the initial size to that of *all* available disk space.
From my point of view, the situation looks quite simple:
- set the Mapsize to the highest possible limit
- the DB will probably never reach it, enjoying the performance-size-trade-off
when searching for available free chunk
- when the fragmentation goes so bad that it actually reaches the limit, it's
always better to sacrifice the performance than failing to write data (with no possibility of any growth)
The database can be compacted *outside* of the main application (as I mentioned, performing an `mdb_copy -c ...`) which can be done as part of a maintenance window for example, therefore never sacrificing the performance.
It all comes down to your specific application requirements, hardware and use-case. Which is exactly *why* I suggested your patch be enabled via a flag (an option) on `mdb_env_open`. I can appreciate some may prefer a "try harder" approach to look for free space (accepting potentially slow writes as a consequence), but I think you also need to appreciate others may not want this (and thus solve fragmentation differently).
https://bugs.openldap.org/show_bug.cgi?id=9735
--- Comment #9 from libor.peltan@nic.cz --- OK, I admit that there might be applications that routinely increase mapsize whenever MDB_MAP_FULL is throwed and they are happy with it.
For us, it would be nice if we could achieve our desired behavior (i.e. LMDB tries hard to find free space if mapsize had been reached) anyhow, say by switching any magic flag.
How can I help LMDB authors/maintainers to implement this? Shall I upload a modified patch, or leave it on their (perhaps more polished) coding style?
https://bugs.openldap.org/show_bug.cgi?id=9735
Howard Chu hyc@openldap.org changed:
What |Removed |Added ---------------------------------------------------------------------------- Ever confirmed|0 |1 Status|UNCONFIRMED |CONFIRMED
--- Comment #10 from Howard Chu hyc@openldap.org --- Not particularly interested in this approach because with freelist fragmentation, it will frequently be futile. The current code was arrived at after hundreds of hours of testing, we know from experience that minor tweaks like this won't help much.
The way forward is to optimize the in-memory handling of the freelist, the way BoltDB did here https://github.com/etcd-io/bbolt/pull/138
Patches to port their approach into LMDB are welcome. Their patch is simple because they use built in hashtable primitives in Go. A bit more work is needed to do the same here. I recommend using http://www.tommyds.it/ as a starting point.
https://bugs.openldap.org/show_bug.cgi?id=9735
--- Comment #11 from Howard Chu hyc@openldap.org --- Also worth reading http://www.nothings.org/computer/judy/ but that's the article that led me to TommyDS.
https://bugs.openldap.org/show_bug.cgi?id=9735
Howard Chu hyc@openldap.org changed:
What |Removed |Added ---------------------------------------------------------------------------- Resolution|--- |SUSPENDED Status|CONFIRMED |RESOLVED