Hello,
We use OpenLDAP 2.4.57 under RHEL8 with a configuration with 2 directories in MM and 2 replicas. We only write on one of the masters. The size of the LMDB database grows following writes of members of large groups (2000 members) by a batch which runs every 10 minutes. The base grows regularly and sometimes undergoes significant jumps until it reaches its maximum size. The lmdb_stat -ef command indicates that very few pages are used and everything else is free pages. The used pages are stable.
[cid:image001.jpg@01D720BD.58844450]
And yet, it is no longer possible to write in the directory. Sometimes it's on the master directory, sometimes on replicas where syncrepl fails.
2021-03-24T11: 52: 59.978157 + 01: 00 int-ohz-infra1 slapd debug local4 25076 - mdb_id2entry_put: mdb_put failed: MDB_MAP_FULL: Environment mapsize limit reached (-30792) "uid = us-00000301, ou = users, dc = bst, dc = ocn, dc = infra, dc = ftgroup " 2021-03-24T11: 52: 59.978184 + 01: 00 int-ohz-infra1 slapd debug local4 25076 - syncrepl_null_callback: error code 0x50 2021-03-24T11: 52: 59.978202 + 01: 00 int-ohz-infra1 slapd debug local4 25076 - syncrepl_entry: rid = 001 be_modify failed (80) 2021-03-24T11: 52: 59.978780 + 01: 00 int-ohz-infra1 slapd debug local4 25076 - do_syncrepl: rid = 001 rc 80 retrying
There is no transaction in progress: mdb_stat -r /var/lib/ldap/data/ Reader Table Status pid thread txnid 25076 7fa0e42c5480 - 25076 7fa0977fe700 - 25076 7fa097fff700 - 25076 7fa094cfa700 - 25076 7fa086ffe700 - 25076 7fa0857fb700 - 25076 7fa0867fd700 - 25076 7fa085ffc700 - 25076 7fa074ffc700 -
The batch executes a lot of transactions < 2s. Restarting slapd does not resolve the problem. If we compact the database with mdb_copy -c, it only makes a few MB and it works again. The problem no longer appears without large groups.
Why are free pages not used? Wouldn't there be a problem with writing many multi-valued attributes?
Thank you for your help !
Jean-Charles Rogez [https://marketing.csnovidys.com/Logos/banniere_mail.png]
[twitter icon]https://twitter.com/csnovidys
[linkedin icon]https://www.linkedin.com/company/novidy's
Jean-Charles ROGEZ
Architecte Système | INTEGRATION SYSTEMES | PLESSIS Standard : +33180848010tel:+33180848010 Ligne directe : tel:r Mobile : tel: Email : jean-charles.rogez@csnovidys.commailto:jean-charles.rogez@csnovidys.com
Jean-Charles ROGEZ wrote:
Hello,
This is a known limitation with regard to frequent updates of large entries, it causes excessive fragmentation of the free page space. There are workarounds in OpenLDAP 2.5 back-mdb, but work is ongoing to improve free page management in LMDB 1.0.
I suggest you migrate your database to OpenLDAP 2.5, after configuring "multival" on the backend. See the slapd-mdb(5) manpage for details.
We use OpenLDAP 2.4.57 under RHEL8 with a configuration with 2 directories in MM and 2 replicas. We only write on one of the masters.
The size of the LMDB database grows following writes of members of large groups (2000 members) by a batch which runs every 10 minutes.
The base grows regularly and sometimes undergoes significant jumps until it reaches its maximum size.
The lmdb_stat -ef command indicates that very few pages are used and everything else is free pages. The used pages are stable.
And yet, it is no longer possible to write in the directory. Sometimes it's on the master directory, sometimes on replicas where syncrepl fails.
2021-03-24T11: 52: 59.978157 + 01: 00 int-ohz-infra1 slapd debug local4 25076 - mdb_id2entry_put: mdb_put failed: MDB_MAP_FULL: Environment mapsize limit reached (-30792) "uid = us-00000301, ou = users, dc = bst, dc = ocn, dc = infra, dc = ftgroup "
2021-03-24T11: 52: 59.978184 + 01: 00 int-ohz-infra1 slapd debug local4 25076 - syncrepl_null_callback: error code 0x50
2021-03-24T11: 52: 59.978202 + 01: 00 int-ohz-infra1 slapd debug local4 25076 - syncrepl_entry: rid = 001 be_modify failed (80)
2021-03-24T11: 52: 59.978780 + 01: 00 int-ohz-infra1 slapd debug local4 25076 - do_syncrepl: rid = 001 rc 80 retrying
There is no transaction in progress:
mdb_stat -r /var/lib/ldap/data/
Reader Table Status
pid thread txnid
25076 7fa0e42c5480 -
25076 7fa0977fe700 -
25076 7fa097fff700 -
25076 7fa094cfa700 -
25076 7fa086ffe700 -
25076 7fa0857fb700 -
25076 7fa0867fd700 -
25076 7fa085ffc700 -
25076 7fa074ffc700 -
The batch executes a lot of transactions < 2s.
Restarting slapd does not resolve the problem.
If we compact the database with mdb_copy -c, it only makes a few MB and it works again.
The problem no longer appears without large groups.
Why are free pages not used? Wouldn't there be a problem with writing many multi-valued attributes?
Thank you for your help !
Jean-Charles Rogez
twitter icon https://twitter.com/csnovidys
linkedin icon https://www.linkedin.com/company/novidy's
*Jean-Charles ROGEZ* *Architecte Système* | INTEGRATION SYSTEMES | PLESSIS Standard : +33180848010 tel:+33180848010 Ligne directe : tel:r Mobile : tel: Email : jean-charles.rogez@csnovidys.com mailto:jean-charles.rogez@csnovidys.com
On 3/24/21 4:35 PM, Howard Chu wrote:
Jean-Charles ROGEZ wrote: This is a known limitation with regard to frequent updates of large entries, it causes excessive fragmentation of the free page space.
Does this also happen in case the LDAP client only changes a few attribute values of a large entry? If yes, the original poster could mitigate the bad effects by changing the sync process.
(Yes, the add request with initial attribute value set is expected to be large. Also some subsequent modify requests in case of larger reorganisations.)
Ciao, Michael.
Michael Ströder wrote:
On 3/24/21 4:35 PM, Howard Chu wrote:
Jean-Charles ROGEZ wrote: This is a known limitation with regard to frequent updates of large entries, it causes excessive fragmentation of the free page space.
Does this also happen in case the LDAP client only changes a few attribute values of a large entry? If yes, the original poster could mitigate the bad effects by changing the sync process.
Yes it makes no difference since by default, back-mdb stores all entries as single blobs. Thus the entire entry is always rewritten on any modification. LMDB stores all records as a single contiguous set of pages, so the larger the entry, the more contiguous pages it needs for storing them. Since the free page manager in LMDB is extremely simplistic, it doesn't do well with random allocations and deallocations of widely varying sizes, and so over time the free space fragmentation gets worse, resulting in spans of pages that are too small to service the requests for the larger entries.
The multival feature in OpenLDAP 2.5 back-mdb lets you configure some attributes to have their values stored as individual DB records, which then means that modifications of individual values don't have to rewrite the entire entry. And with smaller records, the fragmentation issue goes away.
The tradeoff of course is that it takes multiple DB accesses to construct an entry, so write performance improves at the cost of read performance. There's no free lunch...
(Yes, the add request with initial attribute value set is expected to be large. Also some subsequent modify requests in case of larger reorganisations.)
Ciao, Michael.
On 24.03.2021 16:52, Howard Chu wrote:
Yes it makes no difference since by default, back-mdb stores all entries as single blobs. Thus the entire entry is always rewritten on any modification. LMDB stores all records as a single contiguous set of pages, (...)
Please put this explanation in the documentation.
But also, the manpage isn't all that helpful about which values to pick for "multival", unless I'm missing something. What's "very large"? Could you guys write down some of what you are thinking to choose these values? Or if that's complicated, maybe back-mdb could compute them based on more generic params, as long as it does not get too clever. (Since that kind of cleverness goes very wrong at times.)
Nitpick: It's unusual to mention UINT_MAX as a value to set - the norm is 0 for that. Or "0 = no limit, in practice meaning up to UINT_MAX" if the value of UINT_MAX is relevant to understanding what will happen.
Hallvard Breien Furuseth h.b.furuseth@usit.uio.no schrieb am 24.03.2021 um
23:37 in Nachricht a469e51f-87be-dcaa-094e-36bd54f6fc77@usit.uio.no:
On 24.03.2021 16:52, Howard Chu wrote:
Yes it makes no difference since by default, back-mdb stores all entries as single blobs. Thus the entire entry is always rewritten on any modification. LMDB stores all records as a single contiguous set of pages, (...)
I think that fact that any entry is written to a contiguous block of free pages ("free extent") is more important than the fact that an entry is re-written even if only some part was changed. I think the latter is more or less obvious...
Please put this explanation in the documentation.
But also, the manpage isn't all that helpful about which values to pick for "multival", unless I'm missing something. What's "very large"? Could you guys write down some of what you are thinking to choose these values? Or if that's complicated, maybe back-mdb could compute them based on more generic params, as long as it does not get too clever. (Since that kind of cleverness goes very wrong at times.)
Nitpick: It's unusual to mention UINT_MAX as a value to set - the norm is 0 for that. Or "0 = no limit, in practice meaning up to UINT_MAX" if the value of UINT_MAX is relevant to understanding what will happen.
Hallvard Breien Furuseth wrote:
On 24.03.2021 23:37, Hallvard Breien Furuseth wrote:> (...) Could you guys write down some of what you are thinking to
choose these values? (...)
I mean "whey you choose these values in practice".
So far, these problems have only cropped up in directories with large groups - thousands of members. The resulting entries occupy hundreds of pages.
Most entries tend to be only 1-2 pages.
--On Thursday, March 25, 2021 1:32 PM +0000 Howard Chu hyc@symas.com wrote:
Hallvard Breien Furuseth wrote:
On 24.03.2021 23:37, Hallvard Breien Furuseth wrote:> (...) Could you guys write down some of what you are thinking to
choose these values? (...)
I mean "whey you choose these values in practice".
So far, these problems have only cropped up in directories with large groups - thousands of members. The resulting entries occupy hundreds of pages.
Most entries tend to be only 1-2 pages.
I'd also note it's generally useful to combine multival with sortvals. I wish sortvals could be per-db instead of global, but oh well. :)
--Quanah
--
Quanah Gibson-Mount Product Architect Symas Corporation Packaged, certified, and supported LDAP solutions powered by OpenLDAP: http://www.symas.com
On Thu, Mar 25, 2021 at 12:32:56 +0000, Howard Chu wrote:
So far, these problems have only cropped up in directories with large groups - thousands of members. The resulting entries occupy hundreds of pages.
On Zimbra we're still getting fragmentation (and mdb inflation) with entries much smaller than that. The multival backport improved the situation, but did not fix it completely.
Geert
On 25.03.2021 13:32, Howard Chu wrote:
So far, these problems have only cropped up in directories with large groups - thousands of members. The resulting entries occupy hundreds of pages.
Great, put some numbers in the doc. People differ in how they read words like "large". IIRC we've asked a vendor if their file storage coped with big groups: "Sure, we tried 50 members and it worked fine".
And yes, we get large groups. Most users on campus must be members of some licensing group, that sort of thing.
Hallvard
Howard Chu hyc@symas.com schrieb am 24.03.2021 um 16:52 in Nachricht
9d603719-4a02-1309-130f-901e63ffb209@symas.com:
Michael Ströder wrote:
On 3/24/21 4:35 PM, Howard Chu wrote:
Jean-Charles ROGEZ wrote: This is a known limitation with regard to frequent updates of large
entries,
it causes
excessive fragmentation of the free page space.
Does this also happen in case the LDAP client only changes a few attribute values of a large entry? If yes, the original poster could mitigate the bad effects by changing the sync process.
Yes it makes no difference since by default, back-mdb stores all entries as single blobs. Thus the entire entry is always rewritten on any
modification.
LMDB stores all records as a single contiguous set of pages, so the larger the entry, the more contiguous pages it needs for storing them. Since the free page manager in LMDB is extremely simplistic, it doesn't do well with
random
allocations and deallocations of widely varying sizes, and so over time the free space fragmentation gets worse, resulting in spans of pages that are too small to service the requests for the larger entries.
Interestingly the BtrFS filesystem (on Linux) had similar problems in early versions. Today there are "reorganizing background jobs" that shuffle the blocks around to make freespace available. Still, if you filled your BtrFS filesystem to 100% there is no way to recover (by removing files). The only solution is to add more space, but I'm even unsure it works when the filesystem is 100% full.
The traditional filesystems (like ext2) are more robust in such circumstances.
Regards, Ulrich
The multival feature in OpenLDAP 2.5 back-mdb lets you configure some attributes to have their values stored as individual DB records, which then means that modifications of individual values don't have to rewrite the entire entry. And with smaller records, the fragmentation issue goes away.
The tradeoff of course is that it takes multiple DB accesses to construct an entry, so write performance improves at the cost of read performance. There's no free lunch...
(Yes, the add request with initial attribute value set is expected to be large. Also some subsequent modify requests in case of larger reorganisations.)
Ciao, Michael.
-- -- Howard Chu CTO, Symas Corp. http://www.symas.com Director, Highland Sun http://highlandsun.com/hyc/ Chief Architect, OpenLDAP http://www.openldap.org/project/
Thank you very much for your quick response.
We will test OpenLDAP 2.5 to see if that improves things. In addition, this new version brings new features that are very interesting for us. Is OpenLDAP 2.5.2 stable enough to be used in production?
Best regards Jean-Charles
Jean-Charles ROGEZ wrote:
Thank you very much for your quick response.
We will test OpenLDAP 2.5 to see if that improves things. In addition, this new version brings new features that are very interesting for us. Is OpenLDAP 2.5.2 stable enough to be used in production?
We've been resolving bugs pretty intensely, so 2.5 is in most ways superior to 2.4. But 2.5.3 is due in the next week or so. You can of course start testing with 2.5.2 but don't expect to stay on it for long.
Best regards Jean-Charles
Jean-Charles ROGEZ jean-charles.rogez@csnovidys.com schrieb am 24.03.2021
um 15:06 in Nachricht PR2P264MB0911634B6ED936442371A372D1639@PR2P264MB0911.FRAP264.PROD.OUTLOOK.COM:
Hello,
We use OpenLDAP 2.4.57 under RHEL8 with a configuration with 2 directories in MM and 2 replicas. We only write on one of the masters. The size of the LMDB database grows following writes of members of large groups (2000 members) by a batch which runs every 10 minutes.
Does that mean you re-write your 2000 members every 10 minutes, or do just just update (add or drop a few) the members?
The base grows regularly and sometimes undergoes significant jumps until it
reaches its maximum size. The lmdb_stat ‑ef command indicates that very few pages are used and everything else is free pages. The used pages are stable.
[cid:image001.jpg@01D720BD.58844450]
And yet, it is no longer possible to write in the directory. Sometimes it's
on the master directory, sometimes on replicas where syncrepl fails.
2021‑03‑24T11: 52: 59.978157 + 01: 00 int‑ohz‑infra1 slapd debug local4
25076 ‑
mdb_id2entry_put: mdb_put failed: MDB_MAP_FULL: Environment mapsize limit reached (‑30792) "uid = us‑00000301, ou = users, dc = bst, dc = ocn, dc = infra, dc = ftgroup " 2021‑03‑24T11: 52: 59.978184 + 01: 00 int‑ohz‑infra1 slapd debug local4
25076 ‑
syncrepl_null_callback: error code 0x50 2021‑03‑24T11: 52: 59.978202 + 01: 00 int‑ohz‑infra1 slapd debug local4
25076 ‑
syncrepl_entry: rid = 001 be_modify failed (80) 2021‑03‑24T11: 52: 59.978780 + 01: 00 int‑ohz‑infra1 slapd debug local4
25076 ‑
do_syncrepl: rid = 001 rc 80 retrying
There is no transaction in progress: mdb_stat ‑r /var/lib/ldap/data/ Reader Table Status pid thread txnid 25076 7fa0e42c5480 ‑ 25076 7fa0977fe700 ‑ 25076 7fa097fff700 ‑ 25076 7fa094cfa700 ‑ 25076 7fa086ffe700 ‑ 25076 7fa0857fb700 ‑ 25076 7fa0867fd700 ‑ 25076 7fa085ffc700 ‑ 25076 7fa074ffc700 ‑
The batch executes a lot of transactions < 2s. Restarting slapd does not resolve the problem. If we compact the database with mdb_copy ‑c, it only makes a few MB and it works again. The problem no longer appears without large groups.
Why are free pages not used? Wouldn't there be a problem with writing many multi‑valued attributes?
Thank you for your help !
Jean‑Charles Rogez [https://marketing.csnovidys.com/Logos/banniere_mail.png]
[twitter icon]https://twitter.com/csnovidys
[linkedin icon]https://www.linkedin.com/company/novidy's
Jean‑Charles ROGEZ
Architecte Système | INTEGRATION SYSTEMES | PLESSIS Standard : +33180848010tel:+33180848010 Ligne directe : tel:r Mobile : tel: Email : jean‑charles.rogez@csnovidys.commailto:jean‑charles.rogez@csnovidys.com
Hello,
Yes, the member attribute is rewritten every 10 minutes. We are going to modify the batch so that it updates the attribute in differential mode and not in full mode. This will slow down the increase of the base before switching to OpenLDAP 2.5.
Jean-Charles
-----Message d'origine----- De : Ulrich Windl Ulrich.Windl@rz.uni-regensburg.de Envoyé : jeudi 25 mars 2021 07:30 À : Jean-Charles ROGEZ jean-charles.rogez@csnovidys.com; openldap-technical@openldap.org Cc : Cédric MORELLE cedric.morelle@csnovidys.com; Loïc NERTOMB loic.nertomb@csnovidys.com; Pierre@hypatia.openldap.org Objet : Antw: [EXT] MDB_MAP_FULL with plenty of free pages
Jean-Charles ROGEZ jean-charles.rogez@csnovidys.com schrieb am 24.03.2021
um 15:06 in Nachricht PR2P264MB0911634B6ED936442371A372D1639@PR2P264MB0911.FRAP264.PROD.OUTLOOK.COM:
Hello,
We use OpenLDAP 2.4.57 under RHEL8 with a configuration with 2 directories in MM and 2 replicas. We only write on one of the masters. The size of the LMDB database grows following writes of members of large groups (2000 members) by a batch which runs every 10 minutes.
Does that mean you re-write your 2000 members every 10 minutes, or do just just update (add or drop a few) the members?
The base grows regularly and sometimes undergoes significant jumps until it
reaches its maximum size. The lmdb_stat ‑ef command indicates that very few pages are used and everything else is free pages. The used pages are stable.
[cid:image001.jpg@01D720BD.58844450]
And yet, it is no longer possible to write in the directory. Sometimes it's
on the master directory, sometimes on replicas where syncrepl fails.
2021‑03‑24T11: 52: 59.978157 + 01: 00 int‑ohz‑infra1 slapd debug local4
25076 ‑
mdb_id2entry_put: mdb_put failed: MDB_MAP_FULL: Environment mapsize limit reached (‑30792) "uid = us‑00000301, ou = users, dc = bst, dc = ocn, dc = infra, dc = ftgroup " 2021‑03‑24T11: 52: 59.978184 + 01: 00 int‑ohz‑infra1 slapd debug local4
25076 ‑
syncrepl_null_callback: error code 0x50 2021‑03‑24T11: 52: 59.978202 + 01: 00 int‑ohz‑infra1 slapd debug local4
25076 ‑
syncrepl_entry: rid = 001 be_modify failed (80) 2021‑03‑24T11: 52: 59.978780 + 01: 00 int‑ohz‑infra1 slapd debug local4
25076 ‑
do_syncrepl: rid = 001 rc 80 retrying
There is no transaction in progress: mdb_stat ‑r /var/lib/ldap/data/ Reader Table Status pid thread txnid 25076 7fa0e42c5480 ‑ 25076 7fa0977fe700 ‑ 25076 7fa097fff700 ‑ 25076 7fa094cfa700 ‑ 25076 7fa086ffe700 ‑ 25076 7fa0857fb700 ‑ 25076 7fa0867fd700 ‑ 25076 7fa085ffc700 ‑ 25076 7fa074ffc700 ‑
The batch executes a lot of transactions < 2s. Restarting slapd does not resolve the problem. If we compact the database with mdb_copy ‑c, it only makes a few MB and it works again. The problem no longer appears without large groups.
Why are free pages not used? Wouldn't there be a problem with writing many multi‑valued attributes?
Thank you for your help !
Jean‑Charles Rogez [https://marketing.csnovidys.com/Logos/banniere_mail.png]
[twitter icon]https://twitter.com/csnovidys
[linkedin icon]https://www.linkedin.com/company/novidy's
Jean‑Charles ROGEZ
Architecte Système | INTEGRATION SYSTEMES | PLESSIS Standard : +33180848010tel:+33180848010 Ligne directe : tel:r Mobile : tel: Email : jean‑charles.rogez@csnovidys.commailto:jean‑charles.rogez@csnovidys.c om
[https://marketing.csnovidys.com/Logos/banniere_mail.png]
[twitter icon]https://twitter.com/csnovidys
[linkedin icon]https://www.linkedin.com/company/novidy's
Jean-Charles ROGEZ
Architecte Système | INTEGRATION SYSTEMES | PLESSIS Standard : +33180848010tel:+33180848010 Ligne directe : tel:r Mobile : tel: Email : jean-charles.rogez@csnovidys.commailto:jean-charles.rogez@csnovidys.com
openldap-technical@openldap.org