--On Thursday, August 24, 2017 8:30 PM -0500 Quanah Gibson-Mount quanah@symas.com wrote:
Hi Geert,
If I could, I would delete 8664 from the ITS system entirely as it was filed based on invalid information that was provided to me. It generally should be ignored.
When a write operation is performed with LMDB, the freelist is scanned for available space to reuse if possible. The larger the size of the freelist, the longer amount of time it will take for the operation to complete successfully. When the database has gotten to a certain point of fragmentation (This differs based on any individual use case), it will be start taking a noticeable amount of time for those write operations to complete and the server processing the write operation does essentially come to a halt during this process. Once the write operation completes, things go back to normal. The only solution is to dump and reload the database (slapcat/slapadd or mdb_copy -c). Eventually, you will get back into the same situation and have to do this again.
A recent option was added to the slapd-mdb configuration (rtxnsize) that can also help reduce the rate of fragmentation. There are some performance related issues you can find discussed on the -devel list from when it was added. Whether or not you are affected by them and whether or not the setting will help you in particular depends on whether or not your searches result in a large number of entries being returned. You can find some guidelines around tuning the parameter that I came up with in that thread. If you do not have an unlimited Zimbra License, the license check performed by the store servers will definitely affect this, since the result set is all active accounts which can be quite large.
Additionally, I had at one point had a patch for the Zimbra build of OpenLDAP that made it very aggressive in finding freespace to reuse. I don't recall if it is still applied (I don't believe it currently is based on what I saw in github). It basically meant that in Zimbra, it would work extra hard to find reusable freespace, which would reduce the rate at which the database would fragment, but it also meant that once the DB was fragmented enough, it would amplify the amount of time it took for a write op to complete. I.e., it was a tradeoff of a longer time to reach a catastrophic state, but the state was more catastrophic once achieved.
This is one area where LMDB differs significantly from back-hdb/bdb. You could have back-bdb/hdb databases that endured a high rate of write operations be in effect for years w/o needing maintenance. With LMDB, you get better read & write rates, but it requires periodic reloads.
I wanted to follow up on this, based on doing an examination of Geert's database, and other affected databases. Geert already has this answer, but it's useful for the general OpenLDAP community.
This fragmentation problem is not common. It depends entirely on size of the entries in the database. The issue arises when entries in the LDAP DB are greater than the LMDB pagesize (Usually 4KB) and then have frequent updates. This most often occurs in one of two ways:
a) multi-valued attributes with a large number of values b) a very large single-valued attribute (I.e., binary data)
For the first problem (a), there is code in the 2.5 release to address this problem, called multival. This feature puts multi-valued attributes with a (configurable) number of values into its own sub-database. For (b), there's not really a solutionn, but it's pretty rare.
So for those who have entries that are < 4 KB, they will never see this problem. Note that this is the size of the binary entry on disk, not the size of the entry when exported to LDIF. The binary size is generally significantly smaller than the LDIF version.
--Quanah
On 03.01.18 00:06, Quanah Gibson-Mount wrote:
I wanted to follow up on this, based on doing an examination of Geert's database, and other affected databases. Geert already has this answer, but it's useful for the general OpenLDAP community.
This fragmentation problem is not common. It depends entirely on size of the entries in the database. The issue arises when entries in the LDAP DB are greater than the LMDB pagesize (Usually 4KB) and then have frequent updates. This most often occurs in one of two ways:
a) multi-valued attributes with a large number of values b) a very large single-valued attribute (I.e., binary data)
For the first problem (a), there is code in the 2.5 release to address this problem, called multival. This feature puts multi-valued attributes with a (configurable) number of values into its own sub-database. For (b), there's not really a solutionn, but it's pretty rare.
So for those who have entries that are < 4 KB, they will never see this problem. Note that this is the size of the binary entry on disk, not the size of the entry when exported to LDIF. The binary size is generally significantly smaller than the LDIF version.
--Quanah
Hi,
I did some own research on this issue in the meantime and gained some more details about overflow/bigdata: A constant in the LMDB code defines that each tree page must be able to store at least two tree nodes. So each node may not be larger than half of the page size (minus the page header size). As the node also contains the key data, the key contributes to the size of the node. With a maximum of 511 bytes for the key, only data roughly below 1500 bytes will be always stored within the tree and not in overflow pages.
In respect to overflow pages, it needs to be considered that they contain a single header also. Choosing exactly a multiple of the page size as the data size will thus definitely waste nearly a full page.
Unfortunately, the various constants and calculated values can not be retrieved via the regular API, so there is no safe way to deal with it from a user's perspective.
I have not yet investigated how LMDB stores released runs of pages and what strategies are used for allocation, specifically, whether only exactly matching sizes are taken or whether larger runs are broken up. In any way, I do not expect a fragmentation problem if only data is used which requires only a few pages.
For the project where I am using LMDB, there is a certain likeliness that the data may be megabytes large. I currently plan to revise the way the data is stored and to split it up into multiple chunks, each represented by an individual database entry. The chunks will be dimensioned that the number of overflow pages will be always a power of two, e.g. 8, 16 and 32 pages, even if it creates unused space within the chunk. This will of course not stop the fragmentation, but keep the problem at a much lower level.
Regards,
Klaus
Klaus Malorny wrote:
In respect to overflow pages, it needs to be considered that they contain a single header also. Choosing exactly a multiple of the page size as the data size will thus definitely waste nearly a full page.
Part of the work for LMDB 1.0 will be to remove the overflow page header, to avoid this waste.
openldap-technical@openldap.org