New subject: mdb fragmentation

2 Jan 2018


      --On Thursday, August 24, 2017 8:30 PM -0500 Quanah Gibson-Mount 
quanah@symas.com wrote:
...
Hi Geert,
If I could, I would delete 8664 from the ITS system entirely as it was
filed based on invalid information that was provided to me.  It generally
should be ignored.
When a write operation is performed with LMDB, the freelist is scanned
for available space to reuse if possible.  The larger the size of the
freelist, the longer amount of time it will take for the operation to
complete successfully.  When the database has gotten to a certain point
of fragmentation (This differs based on any individual use case), it will
be start taking a noticeable amount of time for those write operations to
complete and the server processing the write operation does essentially
come to a halt during this process.  Once the write operation completes,
things go back to normal.  The only solution is to dump and reload the
database (slapcat/slapadd or mdb_copy -c). Eventually, you will get back
into the same situation and have to do this again.
A recent option was added to the slapd-mdb configuration (rtxnsize) that
can also help reduce the rate of fragmentation.  There are some
performance related issues you can find discussed on the -devel list from
when it was added.  Whether or not you are affected by them and whether
or not the setting will help you in particular depends on whether or not
your searches result in a large number of entries being returned.  You
can find some guidelines around tuning the parameter that I came up with
in that thread.  If you do not have an unlimited Zimbra License, the
license check performed by the store servers will definitely affect this,
since the result set is all active accounts which can be quite large.
Additionally, I had at one point had a patch for the Zimbra build of
OpenLDAP that made it very aggressive in finding freespace to reuse.  I
don't recall if it is still applied (I don't believe it currently is
based on what I saw in github).  It basically meant that in Zimbra, it
would work extra hard to find reusable freespace, which would reduce the
rate at which the database would fragment, but it also meant that once
the DB was fragmented enough, it would amplify the amount of time it took
for a write op to complete.  I.e., it was a tradeoff of a longer time to
reach a catastrophic state, but the state was more catastrophic once
achieved.
This is one area where LMDB differs significantly from back-hdb/bdb.  You
could have back-bdb/hdb databases that endured a high rate of write
operations be in effect for years w/o needing maintenance.  With LMDB,
you get better read & write rates, but it requires periodic reloads.
I wanted to follow up on this, based on doing an examination of Geert's 
database, and other affected databases.  Geert already has this answer, but 
it's useful for the general OpenLDAP community.
This fragmentation problem is not common.  It depends entirely on size of 
the entries in the database.  The issue arises when entries in the LDAP DB 
are greater than the LMDB pagesize (Usually 4KB) and then have frequent 
updates.  This most often occurs in one of two ways:
a) multi-valued attributes with a large number of values
b) a very large single-valued attribute (I.e., binary data)
For the first problem (a), there is code in the 2.5 release to address this 
problem, called multival.  This feature puts multi-valued attributes with a 
(configurable) number of values into its own sub-database.  For (b), 
there's not really a solutionn, but it's pretty rare.
So for those who have entries that are < 4 KB, they will never see this 
problem.  Note that this is the size of the binary entry on disk, not the 
size of the entry when exported to LDIF.  The binary size is generally 
significantly smaller than the LDIF version.
--Quanah

Re: mdb fragmentation