Hi Geert,
If I could, I would delete 8664 from the ITS system entirely as it was filed based on invalid information that was provided to me. It generally should be ignored.
When a write operation is performed with LMDB, the freelist is scanned for available space to reuse if possible. The larger the size of the freelist, the longer amount of time it will take for the operation to complete successfully. When the database has gotten to a certain point of fragmentation (This differs based on any individual use case), it will be start taking a noticeable amount of time for those write operations to complete and the server processing the write operation does essentially come to a halt during this process. Once the write operation completes, things go back to normal. The only solution is to dump and reload the database (slapcat/slapadd or mdb_copy -c). Eventually, you will get back into the same situation and have to do this again.
A recent option was added to the slapd-mdb configuration (rtxnsize) that can also help reduce the rate of fragmentation. There are some performance related issues you can find discussed on the -devel list from when it was added. Whether or not you are affected by them and whether or not the setting will help you in particular depends on whether or not your searches result in a large number of entries being returned. You can find some guidelines around tuning the parameter that I came up with in that thread. If you do not have an unlimited Zimbra License, the license check performed by the store servers will definitely affect this, since the result set is all active accounts which can be quite large.
Additionally, I had at one point had a patch for the Zimbra build of OpenLDAP that made it very aggressive in finding freespace to reuse. I don't recall if it is still applied (I don't believe it currently is based on what I saw in github). It basically meant that in Zimbra, it would work extra hard to find reusable freespace, which would reduce the rate at which the database would fragment, but it also meant that once the DB was fragmented enough, it would amplify the amount of time it took for a write op to complete. I.e., it was a tradeoff of a longer time to reach a catastrophic state, but the state was more catastrophic once achieved.
This is one area where LMDB differs significantly from back-hdb/bdb. You could have back-bdb/hdb databases that endured a high rate of write operations be in effect for years w/o needing maintenance. With LMDB, you get better read & write rates, but it requires periodic reloads.
Hope this helps!
--Quanah
----- Original Message -----
From: "Geert Hendrickx" geert@hendrickx.be To: "openldap-technical" openldap-technical@openldap.org Sent: Thursday, August 24, 2017 4:53:32 AM Subject: mdb fragmentation
Hi
We have an OpenLDAP 2.4.44 based, 4-way MMR setup with 4 M entries, which is fairly write intensive (Zimbra).
Lately we've seen very frequent lockups of the master that receives the updates (only 1 out of 4), whereas the replicas stay responsive. According to -d stats logs, all threads suddenly take a long time to answer any queries, and slapd can no longer accept new connections. The issue always disappears again without intervention, but usually hits a number of times in a row, on an almost daily basis.
We tested a lot of things, but eventually "solved" the issue with a slapcat and slapadd of the database - the master server has been completely stable again since. The mdb was also reduced 50% in size.
Looking at the old mdb (prior to the dump), mdb_stat -f shows it had over 3.7 M free pages. Could it be an issue of database fragmentation similar to ITS#8664?
Is it natural that the freelist (and thus the mdb) gets this big over time, I would expect those free pages to get reused constantly? And in that case would it make sense to monitor the number of free pages? Is there a threshold to look for, before things get problematic again? (ITS#7770 would come handy here, as we already monitor/graph various metrics from the monitor backend)
Geert
-- geert.hendrickx.be :: geert@hendrickx.be :: PGP: 0xC4BB9E9F This e-mail was composed using 100% recycled spam messages!
On Thu, Aug 24, 2017 at 19:30:17 -0500, Quanah Gibson-Mount wrote:
When a write operation is performed with LMDB, the freelist is scanned for available space to reuse if possible. The larger the size of the freelist, the longer amount of time it will take for the operation to complete successfully. When the database has gotten to a certain point of fragmentation (This differs based on any individual use case), it will be start taking a noticeable amount of time for those write operations to complete and the server processing the write operation does essentially come to a halt during this process. Once the write operation completes, things go back to normal. The only solution is to dump and reload the database (slapcat/slapadd or mdb_copy -c). Eventually, you will get back into the same situation and have to do this again.
[..]
This is one area where LMDB differs significantly from back-hdb/bdb. You could have back-bdb/hdb databases that endured a high rate of write operations be in effect for years w/o needing maintenance. With LMDB, you get better read & write rates, but it requires periodic reloads.
Thanks Quanah, this definitely explains the issues we saw.
So we'll have to live with periodic mdb maintenance. I think with mdb_copy -c it should be quite doable, as opposed to slapcat/slapadd which took all day.
I'll have to look for some freelist size threshold on which we can set an alert, before we get into noticeable trouble again.
Can the need for this periodic mdb maintenance be documented in the OpenLDAP admin guide?
I'll respond to the Zimbra specific remarks off-list.
Geert
On 8/25/17 2:30 AM, Quanah Gibson-Mount wrote:
Hi Geert,
If I could, I would delete 8664 from the ITS system entirely as it was filed based on invalid information that was provided to me. It generally should be ignored.
When a write operation is performed with LMDB, the freelist is scanned for available space to reuse if possible. The larger the size of the freelist, the longer amount of time it will take for the operation to complete successfully. When the database has gotten to a certain point of fragmentation (This differs based on any individual use case), it will be start taking a noticeable amount of time for those write operations to complete and the server processing the write operation does essentially come to a halt during this process. [...]
Hope this helps!
--Quanah
Hi all!
Hmm, I am a bit alarmed by this. I would have expected that the free blocks would be sorted by size to some extent, so that suitable blocks are found fairly fast. But I already had the impression that this is not the case when I analyzed mdb_stat.c how it calculates the size of free space...
Since I ran into an allocation problem with my software on a test system anyway -- the database was "full" despite of reported gigabytes free space --, I wonder whether I should limit the size of larger data values and also round the sizes up, e.g. to the next power of two, in order to reduce the risk of such problems.
From that perspective it would be also interesting to me from which size on LMDB allocates extents to store the data (please forgive me if this is obvious and I missed that or if I have a conceptual misunderstanding).
Klaus
openldap-technical@openldap.org