MDB_MAP_FULL with plenty of free pages

List overview All Threads
Download

newer

older

openSUSE and SLE packages of...

OpenLDAP LTB packages for 2.4.58...

Jean-Charles ROGEZ

24 Mar 2021 24 Mar '21

7:06 a.m.

Hello,

We use OpenLDAP 2.4.57 under RHEL8 with a configuration with 2 directories in MM and 2 replicas. We only write on one of the masters. The size of the LMDB database grows following writes of members of large groups (2000 members) by a batch which runs every 10 minutes. The base grows regularly and sometimes undergoes significant jumps until it reaches its maximum size. The lmdb_stat -ef command indicates that very few pages are used and everything else is free pages. The used pages are stable.

[cid:image001.jpg@01D720BD.58844450]

And yet, it is no longer possible to write in the directory. Sometimes it's on the master directory, sometimes on replicas where syncrepl fails.

2021-03-24T11: 52: 59.978157 + 01: 00 int-ohz-infra1 slapd debug local4 25076 - mdb_id2entry_put: mdb_put failed: MDB_MAP_FULL: Environment mapsize limit reached (-30792) "uid = us-00000301, ou = users, dc = bst, dc = ocn, dc = infra, dc = ftgroup " 2021-03-24T11: 52: 59.978184 + 01: 00 int-ohz-infra1 slapd debug local4 25076 - syncrepl_null_callback: error code 0x50 2021-03-24T11: 52: 59.978202 + 01: 00 int-ohz-infra1 slapd debug local4 25076 - syncrepl_entry: rid = 001 be_modify failed (80) 2021-03-24T11: 52: 59.978780 + 01: 00 int-ohz-infra1 slapd debug local4 25076 - do_syncrepl: rid = 001 rc 80 retrying

There is no transaction in progress: mdb_stat -r /var/lib/ldap/data/ Reader Table Status pid thread txnid 25076 7fa0e42c5480 - 25076 7fa0977fe700 - 25076 7fa097fff700 - 25076 7fa094cfa700 - 25076 7fa086ffe700 - 25076 7fa0857fb700 - 25076 7fa0867fd700 - 25076 7fa085ffc700 - 25076 7fa074ffc700 -

The batch executes a lot of transactions < 2s. Restarting slapd does not resolve the problem. If we compact the database with mdb_copy -c, it only makes a few MB and it works again. The problem no longer appears without large groups.

Why are free pages not used? Wouldn't there be a problem with writing many multi-valued attributes?

Thank you for your help !

Jean-Charles Rogez [https://marketing.csnovidys.com/Logos/banniere_mail.png]

[twitter icon]https://twitter.com/csnovidys

[linkedin icon]https://www.linkedin.com/company/novidy's

Jean-Charles ROGEZ

Architecte Système | INTEGRATION SYSTEMES | PLESSIS Standard : +33180848010tel:+33180848010 Ligne directe : tel:r Mobile : tel: Email : jean-charles.rogez@csnovidys.commailto:jean-charles.rogez@csnovidys.com

Attachments:

attachment.htm (text/html — 8.8 KB)
image001.jpg (image/jpeg — 50.9 KB)

Show replies by date

Howard Chu

24 Mar 24 Mar

8:35 a.m.

Jean-Charles ROGEZ wrote:

...

Hello,

This is a known limitation with regard to frequent updates of large entries, it causes excessive fragmentation of the free page space. There are workarounds in OpenLDAP 2.5 back-mdb, but work is ongoing to improve free page management in LMDB 1.0.

I suggest you migrate your database to OpenLDAP 2.5, after configuring "multival" on the backend. See the slapd-mdb(5) manpage for details.

...

We use OpenLDAP 2.4.57 under RHEL8 with a configuration with 2 directories in MM and 2 replicas. We only write on one of the masters.

The size of the LMDB database grows following writes of members of large groups (2000 members) by a batch which runs every 10 minutes.

The base grows regularly and sometimes undergoes significant jumps until it reaches its maximum size.

The lmdb_stat -ef command indicates that very few pages are used and everything else is free pages. The used pages are stable.

And yet, it is no longer possible to write in the directory. Sometimes it's on the master directory, sometimes on replicas where syncrepl fails.

2021-03-24T11: 52: 59.978157 + 01: 00 int-ohz-infra1 slapd debug local4 25076 - mdb_id2entry_put: mdb_put failed: MDB_MAP_FULL: Environment mapsize limit reached (-30792) "uid = us-00000301, ou = users, dc = bst, dc = ocn, dc = infra, dc = ftgroup "

2021-03-24T11: 52: 59.978184 + 01: 00 int-ohz-infra1 slapd debug local4 25076 - syncrepl_null_callback: error code 0x50

2021-03-24T11: 52: 59.978202 + 01: 00 int-ohz-infra1 slapd debug local4 25076 - syncrepl_entry: rid = 001 be_modify failed (80)

2021-03-24T11: 52: 59.978780 + 01: 00 int-ohz-infra1 slapd debug local4 25076 - do_syncrepl: rid = 001 rc 80 retrying

There is no transaction in progress:

mdb_stat -r /var/lib/ldap/data/

Reader Table Status

pid thread txnid

25076 7fa0e42c5480 -

25076 7fa0977fe700 -

25076 7fa097fff700 -

25076 7fa094cfa700 -

25076 7fa086ffe700 -

25076 7fa0857fb700 -

25076 7fa0867fd700 -

25076 7fa085ffc700 -

25076 7fa074ffc700 -

The batch executes a lot of transactions < 2s.

Restarting slapd does not resolve the problem.

If we compact the database with mdb_copy -c, it only makes a few MB and it works again.

The problem no longer appears without large groups.

Why are free pages not used? Wouldn't there be a problem with writing many multi-valued attributes?

Thank you for your help !

Jean-Charles Rogez

twitter icon https://twitter.com/csnovidys

linkedin icon https://www.linkedin.com/company/novidy's

*Jean-Charles ROGEZ* *Architecte Système* | INTEGRATION SYSTEMES | PLESSIS Standard : +33180848010 tel:+33180848010 Ligne directe : tel:r Mobile : tel: Email : jean-charles.rogez@csnovidys.com mailto:jean-charles.rogez@csnovidys.com

-- -- Howard Chu CTO, Symas Corp. http://www.symas.com Director, Highland Sun http://highlandsun.com/hyc/ Chief Architect, OpenLDAP http://www.openldap.org/project/

Michael Ströder

8:43 a.m.

On 3/24/21 4:35 PM, Howard Chu wrote:

...

Jean-Charles ROGEZ wrote: This is a known limitation with regard to frequent updates of large entries, it causes excessive fragmentation of the free page space.

Does this also happen in case the LDAP client only changes a few attribute values of a large entry? If yes, the original poster could mitigate the bad effects by changing the sync process.

(Yes, the add request with initial attribute value set is expected to be large. Also some subsequent modify requests in case of larger reorganisations.)

Ciao, Michael.

Howard Chu

8:52 a.m.

Michael Ströder wrote:

...

On 3/24/21 4:35 PM, Howard Chu wrote:

...
Jean-Charles ROGEZ wrote: This is a known limitation with regard to frequent updates of large entries, it causes excessive fragmentation of the free page space.

Does this also happen in case the LDAP client only changes a few attribute values of a large entry? If yes, the original poster could mitigate the bad effects by changing the sync process.

Yes it makes no difference since by default, back-mdb stores all entries as single blobs. Thus the entire entry is always rewritten on any modification. LMDB stores all records as a single contiguous set of pages, so the larger the entry, the more contiguous pages it needs for storing them. Since the free page manager in LMDB is extremely simplistic, it doesn't do well with random allocations and deallocations of widely varying sizes, and so over time the free space fragmentation gets worse, resulting in spans of pages that are too small to service the requests for the larger entries.

The multival feature in OpenLDAP 2.5 back-mdb lets you configure some attributes to have their values stored as individual DB records, which then means that modifications of individual values don't have to rewrite the entire entry. And with smaller records, the fragmentation issue goes away.

The tradeoff of course is that it takes multiple DB accesses to construct an entry, so write performance improves at the cost of read performance. There's no free lunch...

...

(Yes, the add request with initial attribute value set is expected to be large. Also some subsequent modify requests in case of larger reorganisations.)

Ciao, Michael.

-- -- Howard Chu CTO, Symas Corp. http://www.symas.com Director, Highland Sun http://highlandsun.com/hyc/ Chief Architect, OpenLDAP http://www.openldap.org/project/

Hallvard Breien Furuseth

3:37 p.m.

On 24.03.2021 16:52, Howard Chu wrote:

...

Yes it makes no difference since by default, back-mdb stores all entries as single blobs. Thus the entire entry is always rewritten on any modification. LMDB stores all records as a single contiguous set of pages, (...)

Please put this explanation in the documentation.

But also, the manpage isn't all that helpful about which values to pick for "multival", unless I'm missing something. What's "very large"? Could you guys write down some of what you are thinking to choose these values? Or if that's complicated, maybe back-mdb could compute them based on more generic params, as long as it does not get too clever. (Since that kind of cleverness goes very wrong at times.)

Nitpick: It's unusual to mention UINT_MAX as a value to set - the norm is 0 for that. Or "0 = no limit, in practice meaning up to UINT_MAX" if the value of UINT_MAX is relevant to understanding what will happen.

Ulrich Windl

25 Mar 25 Mar

12:03 a.m.

New subject: Antw: [EXT] Re: MDB_MAP_FULL with plenty of free pages

...

...
...
Hallvard Breien Furuseth h.b.furuseth@usit.uio.no schrieb am 24.03.2021 um

23:37 in Nachricht a469e51f-87be-dcaa-094e-36bd54f6fc77@usit.uio.no:

...

On 24.03.2021 16:52, Howard Chu wrote:

...
Yes it makes no difference since by default, back-mdb stores all entries as single blobs. Thus the entire entry is always rewritten on any modification. LMDB stores all records as a single contiguous set of pages, (...)

I think that fact that any entry is written to a contiguous block of free pages ("free extent") is more important than the fact that an entry is re-written even if only some part was changed. I think the latter is more or less obvious...

...

Please put this explanation in the documentation.

But also, the manpage isn't all that helpful about which values to pick for "multival", unless I'm missing something. What's "very large"? Could you guys write down some of what you are thinking to choose these values? Or if that's complicated, maybe back-mdb could compute them based on more generic params, as long as it does not get too clever. (Since that kind of cleverness goes very wrong at times.)

Nitpick: It's unusual to mention UINT_MAX as a value to set - the norm is 0 for that. Or "0 = no limit, in practice meaning up to UINT_MAX" if the value of UINT_MAX is relevant to understanding what will happen.

Hallvard Breien Furuseth

4:57 a.m.

On 24.03.2021 23:37, Hallvard Breien Furuseth wrote:> (...) Could you guys write down some of what you are thinking to

...

choose these values? (...)

I mean "whey you choose these values in practice".

Howard Chu

5:32 a.m.

Hallvard Breien Furuseth wrote:

...

On 24.03.2021 23:37, Hallvard Breien Furuseth wrote:> (...) Could you guys write down some of what you are thinking to

...
choose these values? (...)

I mean "whey you choose these values in practice".

So far, these problems have only cropped up in directories with large groups - thousands of members. The resulting entries occupy hundreds of pages.

Most entries tend to be only 1-2 pages.

-- -- Howard Chu CTO, Symas Corp. http://www.symas.com Director, Highland Sun http://highlandsun.com/hyc/ Chief Architect, OpenLDAP http://www.openldap.org/project/

Quanah Gibson-Mount

8:22 a.m.

--On Thursday, March 25, 2021 1:32 PM +0000 Howard Chu hyc@symas.com wrote:

...

Hallvard Breien Furuseth wrote:

...
On 24.03.2021 23:37, Hallvard Breien Furuseth wrote:> (...) Could you guys write down some of what you are thinking to

...
choose these values? (...)

I mean "whey you choose these values in practice".

So far, these problems have only cropped up in directories with large groups - thousands of members. The resulting entries occupy hundreds of pages.

Most entries tend to be only 1-2 pages.

I'd also note it's generally useful to combine multival with sortvals. I wish sortvals could be per-db instead of global, but oh well. :)

--Quanah

Quanah Gibson-Mount Product Architect Symas Corporation Packaged, certified, and supported LDAP solutions powered by OpenLDAP: http://www.symas.com

Geert Hendrickx

1:29 p.m.

On Thu, Mar 25, 2021 at 12:32:56 +0000, Howard Chu wrote:

...

So far, these problems have only cropped up in directories with large groups - thousands of members. The resulting entries occupy hundreds of pages.

On Zimbra we're still getting fragmentation (and mdb inflation) with entries much smaller than that. The multival backport improved the situation, but did not fix it completely.

Geert

Hallvard Breien Furuseth

4:35 p.m.

On 25.03.2021 13:32, Howard Chu wrote:

...

So far, these problems have only cropped up in directories with large groups - thousands of members. The resulting entries occupy hundreds of pages.

Great, put some numbers in the doc. People differ in how they read words like "large". IIRC we've asked a vendor if their file storage coped with big groups: "Sure, we tried 50 members and it worked fine".

And yes, we get large groups. Most users on campus must be members of some licensing group, that sort of thing.

Hallvard

Ulrich Windl

24 Mar 24 Mar

11:49 p.m.

New subject: Antw: [EXT] Re: MDB_MAP_FULL with plenty of free pages

...

...
...
Howard Chu hyc@symas.com schrieb am 24.03.2021 um 16:52 in Nachricht

9d603719-4a02-1309-130f-901e63ffb209@symas.com:

...

Michael Ströder wrote:

...
On 3/24/21 4:35 PM, Howard Chu wrote:

...
Jean-Charles ROGEZ wrote: This is a known limitation with regard to frequent updates of large

entries,

...

it causes

...
...
excessive fragmentation of the free page space.

Does this also happen in case the LDAP client only changes a few attribute values of a large entry? If yes, the original poster could mitigate the bad effects by changing the sync process.

Yes it makes no difference since by default, back-mdb stores all entries as single blobs. Thus the entire entry is always rewritten on any

modification.

...

LMDB stores all records as a single contiguous set of pages, so the larger the entry, the more contiguous pages it needs for storing them. Since the free page manager in LMDB is extremely simplistic, it doesn't do well with

random

...

allocations and deallocations of widely varying sizes, and so over time the free space fragmentation gets worse, resulting in spans of pages that are too small to service the requests for the larger entries.

Interestingly the BtrFS filesystem (on Linux) had similar problems in early versions. Today there are "reorganizing background jobs" that shuffle the blocks around to make freespace available. Still, if you filled your BtrFS filesystem to 100% there is no way to recover (by removing files). The only solution is to add more space, but I'm even unsure it works when the filesystem is 100% full.

The traditional filesystems (like ext2) are more robust in such circumstances.

Regards, Ulrich

...

The multival feature in OpenLDAP 2.5 back-mdb lets you configure some attributes to have their values stored as individual DB records, which then means that modifications of individual values don't have to rewrite the entire entry. And with smaller records, the fragmentation issue goes away.

The tradeoff of course is that it takes multiple DB accesses to construct an entry, so write performance improves at the cost of read performance. There's no free lunch...

...
(Yes, the add request with initial attribute value set is expected to be large. Also some subsequent modify requests in case of larger reorganisations.)

Ciao, Michael.

-- -- Howard Chu CTO, Symas Corp. http://www.symas.com Director, Highland Sun http://highlandsun.com/hyc/ Chief Architect, OpenLDAP http://www.openldap.org/project/

Jean-Charles ROGEZ

9:23 a.m.

Thank you very much for your quick response.

We will test OpenLDAP 2.5 to see if that improves things. In addition, this new version brings new features that are very interesting for us. Is OpenLDAP 2.5.2 stable enough to be used in production?

Best regards Jean-Charles

Howard Chu

9:40 a.m.

Jean-Charles ROGEZ wrote:

...

Thank you very much for your quick response.

We will test OpenLDAP 2.5 to see if that improves things. In addition, this new version brings new features that are very interesting for us. Is OpenLDAP 2.5.2 stable enough to be used in production?

We've been resolving bugs pretty intensely, so 2.5 is in most ways superior to 2.4. But 2.5.3 is due in the next week or so. You can of course start testing with 2.5.2 but don't expect to stay on it for long.

...

Best regards Jean-Charles

-- -- Howard Chu CTO, Symas Corp. http://www.symas.com Director, Highland Sun http://highlandsun.com/hyc/ Chief Architect, OpenLDAP http://www.openldap.org/project/

Ulrich Windl

11:29 p.m.

New subject: Antw: [EXT] MDB_MAP_FULL with plenty of free pages

...

...
...
Jean-Charles ROGEZ jean-charles.rogez@csnovidys.com schrieb am 24.03.2021

um 15:06 in Nachricht PR2P264MB0911634B6ED936442371A372D1639@PR2P264MB0911.FRAP264.PROD.OUTLOOK.COM:

...

Hello,

We use OpenLDAP 2.4.57 under RHEL8 with a configuration with 2 directories in MM and 2 replicas. We only write on one of the masters. The size of the LMDB database grows following writes of members of large groups (2000 members) by a batch which runs every 10 minutes.

Does that mean you re-write your 2000 members every 10 minutes, or do just just update (add or drop a few) the members?

...

The base grows regularly and sometimes undergoes significant jumps until it

...

reaches its maximum size. The lmdb_stat ‑ef command indicates that very few pages are used and everything else is free pages. The used pages are stable.

[cid:image001.jpg@01D720BD.58844450]

And yet, it is no longer possible to write in the directory. Sometimes it's

...

on the master directory, sometimes on replicas where syncrepl fails.

2021‑03‑24T11: 52: 59.978157 + 01: 00 int‑ohz‑infra1 slapd debug local4

25076 ‑

...

mdb_id2entry_put: mdb_put failed: MDB_MAP_FULL: Environment mapsize limit reached (‑30792) "uid = us‑00000301, ou = users, dc = bst, dc = ocn, dc = infra, dc = ftgroup " 2021‑03‑24T11: 52: 59.978184 + 01: 00 int‑ohz‑infra1 slapd debug local4

25076 ‑

...

syncrepl_null_callback: error code 0x50 2021‑03‑24T11: 52: 59.978202 + 01: 00 int‑ohz‑infra1 slapd debug local4

25076 ‑

...

syncrepl_entry: rid = 001 be_modify failed (80) 2021‑03‑24T11: 52: 59.978780 + 01: 00 int‑ohz‑infra1 slapd debug local4

25076 ‑

...

do_syncrepl: rid = 001 rc 80 retrying

There is no transaction in progress: mdb_stat ‑r /var/lib/ldap/data/ Reader Table Status pid thread txnid 25076 7fa0e42c5480 ‑ 25076 7fa0977fe700 ‑ 25076 7fa097fff700 ‑ 25076 7fa094cfa700 ‑ 25076 7fa086ffe700 ‑ 25076 7fa0857fb700 ‑ 25076 7fa0867fd700 ‑ 25076 7fa085ffc700 ‑ 25076 7fa074ffc700 ‑

The batch executes a lot of transactions < 2s. Restarting slapd does not resolve the problem. If we compact the database with mdb_copy ‑c, it only makes a few MB and it works again. The problem no longer appears without large groups.

Why are free pages not used? Wouldn't there be a problem with writing many multi‑valued attributes?

Thank you for your help !

Jean‑Charles Rogez [https://marketing.csnovidys.com/Logos/banniere_mail.png]

[twitter icon]https://twitter.com/csnovidys

[linkedin icon]https://www.linkedin.com/company/novidy's

Jean‑Charles ROGEZ

Architecte Système | INTEGRATION SYSTEMES | PLESSIS Standard : +33180848010tel:+33180848010 Ligne directe : tel:r Mobile : tel: Email : jean‑charles.rogez@csnovidys.commailto:jean‑charles.rogez@csnovidys.com

Jean-Charles ROGEZ

25 Mar 25 Mar

10:46 a.m.

New subject: Antw: [EXT] MDB_MAP_FULL with plenty of free pages

Hello,

Yes, the member attribute is rewritten every 10 minutes. We are going to modify the batch so that it updates the attribute in differential mode and not in full mode. This will slow down the increase of the base before switching to OpenLDAP 2.5.

Jean-Charles

-----Message d'origine----- De : Ulrich Windl Ulrich.Windl@rz.uni-regensburg.de Envoyé : jeudi 25 mars 2021 07:30 À : Jean-Charles ROGEZ jean-charles.rogez@csnovidys.com; openldap-technical@openldap.org Cc : Cédric MORELLE cedric.morelle@csnovidys.com; Loïc NERTOMB loic.nertomb@csnovidys.com; Pierre@hypatia.openldap.org Objet : Antw: [EXT] MDB_MAP_FULL with plenty of free pages

...

...
...
Jean-Charles ROGEZ jean-charles.rogez@csnovidys.com schrieb am 24.03.2021

um 15:06 in Nachricht PR2P264MB0911634B6ED936442371A372D1639@PR2P264MB0911.FRAP264.PROD.OUTLOOK.COM:

...

Hello,

We use OpenLDAP 2.4.57 under RHEL8 with a configuration with 2 directories in MM and 2 replicas. We only write on one of the masters. The size of the LMDB database grows following writes of members of large groups (2000 members) by a batch which runs every 10 minutes.

Does that mean you re-write your 2000 members every 10 minutes, or do just just update (add or drop a few) the members?

...

The base grows regularly and sometimes undergoes significant jumps until it

...

reaches its maximum size. The lmdb_stat ‑ef command indicates that very few pages are used and everything else is free pages. The used pages are stable.

[cid:image001.jpg@01D720BD.58844450]

And yet, it is no longer possible to write in the directory. Sometimes it's

...

on the master directory, sometimes on replicas where syncrepl fails.

2021‑03‑24T11: 52: 59.978157 + 01: 00 int‑ohz‑infra1 slapd debug local4

25076 ‑

...

mdb_id2entry_put: mdb_put failed: MDB_MAP_FULL: Environment mapsize limit reached (‑30792) "uid = us‑00000301, ou = users, dc = bst, dc = ocn, dc = infra, dc = ftgroup " 2021‑03‑24T11: 52: 59.978184 + 01: 00 int‑ohz‑infra1 slapd debug local4

25076 ‑

...

syncrepl_null_callback: error code 0x50 2021‑03‑24T11: 52: 59.978202 + 01: 00 int‑ohz‑infra1 slapd debug local4

25076 ‑

...

syncrepl_entry: rid = 001 be_modify failed (80) 2021‑03‑24T11: 52: 59.978780 + 01: 00 int‑ohz‑infra1 slapd debug local4

25076 ‑

...

do_syncrepl: rid = 001 rc 80 retrying

There is no transaction in progress: mdb_stat ‑r /var/lib/ldap/data/ Reader Table Status pid thread txnid 25076 7fa0e42c5480 ‑ 25076 7fa0977fe700 ‑ 25076 7fa097fff700 ‑ 25076 7fa094cfa700 ‑ 25076 7fa086ffe700 ‑ 25076 7fa0857fb700 ‑ 25076 7fa0867fd700 ‑ 25076 7fa085ffc700 ‑ 25076 7fa074ffc700 ‑

The batch executes a lot of transactions < 2s. Restarting slapd does not resolve the problem. If we compact the database with mdb_copy ‑c, it only makes a few MB and it works again. The problem no longer appears without large groups.

Why are free pages not used? Wouldn't there be a problem with writing many multi‑valued attributes?

Thank you for your help !

Jean‑Charles Rogez [https://marketing.csnovidys.com/Logos/banniere_mail.png]

[twitter icon]https://twitter.com/csnovidys

[linkedin icon]https://www.linkedin.com/company/novidy's

Jean‑Charles ROGEZ

Architecte Système | INTEGRATION SYSTEMES | PLESSIS Standard : +33180848010tel:+33180848010 Ligne directe : tel:r Mobile : tel: Email : jean‑charles.rogez@csnovidys.commailto:jean‑charles.rogez@csnovidys.c om

[https://marketing.csnovidys.com/Logos/banniere_mail.png]

[twitter icon]https://twitter.com/csnovidys

[linkedin icon]https://www.linkedin.com/company/novidy's

Jean-Charles ROGEZ

1605

Age (days ago)

1606

Last active (days ago)

openldap-technical@openldap.org

15 comments

7 participants

tags (0)

participants (7)

Geert Hendrickx
Hallvard Breien Furuseth
Howard Chu
Jean-Charles ROGEZ
Michael Ströder
Quanah Gibson-Mount
Ulrich Windl