mdb expected growth

List overview All Threads
Download

newer

older

Re: SETTING UP MONITOR

Question regarding slapo-pcache

Paul B. Henson

10 Jan 2014 10 Jan '14

9 p.m.

So after updating our dev ldap environment to use mdb, and slapadd'ing a fresh copy from our production environment into it, the database was using 828M:

Environment Info Map address: (nil) Map size: 2147483648 Page size: 4096 Max pages: 524288 Number of pages used: 211612 Last transaction ID: 1333 Max readers: 126 Number of readers used: 6 Status of Main DB Tree depth: 1 Branch pages: 0 Leaf pages: 1 Overflow pages: 0 Entries: 29

I then proceeded to run a test load on it (basically, I had a script I put together when I added the memberof overlay that ripped through all of our groups, removing and then re-adding all members). After this test run, the use jumped to 1.9G:

Environment Info Map address: (nil) Map size: 2147483648 Page size: 4096 Max pages: 524288 Number of pages used: 479536 Last transaction ID: 1075380 Max readers: 126 Number of readers used: 11 Status of Main DB Tree depth: 1 Branch pages: 0 Leaf pages: 1 Overflow pages: 0 Entries: 29

There is no new data in the database, all that happened was existing data was removed and then re-added. Is this drastic increase in space utilization expected in such a scenario? I bumped the max size from 2G to 4G and am rerunning the same script to see what happens.

Is there any heuristic for how mdb will grow over time, based on the initial size from a fresh slapadd? I plan to set a fairly large maxsize, and also monitor it with munin to keep track of any unexpected growth, but I am curious if it is going to reach a steady-state size or continue growing over time. I would expect growth as additional entries were added, and possibly some growth from overhead when things were changed, but not necessarily such a big jump in size.

Thanks.

Show replies by date

Paul B. Henson

11 Jan 11 Jan

7:21 p.m.

On Fri, Jan 10, 2014 at 01:00:32PM -0800, Paul B. Henson wrote:

...

I then proceeded to run a test load on it (basically, I had a script I put together when I added the memberof overlay that ripped through all of our groups, removing and then re-adding all members). After this test run, the use jumped to 1.9G:

Hmm, well, I reran the same test script; there was no drastic jump in utilization, but it did increase from 479536 pages used to 492848, or 52M additional space used for storing the same content.

Quanah Gibson-Mount

8:45 p.m.

--On Saturday, January 11, 2014 11:21 AM -0800 "Paul B. Henson" henson@acm.org wrote:

...

On Fri, Jan 10, 2014 at 01:00:32PM -0800, Paul B. Henson wrote:

...
I then proceeded to run a test load on it (basically, I had a script I put together when I added the memberof overlay that ripped through all of our groups, removing and then re-adding all members). After this test run, the use jumped to 1.9G:

Hmm, well, I reran the same test script; there was no drastic jump in utilization, but it did increase from 479536 pages used to 492848, or 52M additional space used for storing the same content.

Yes, this is generally what I see when re-running massive changes. There is a one time growth jump, and then it stabilizes.

--Quanah

Quanah Gibson-Mount Architect - Server Zimbra, Inc. -------------------- Zimbra :: the leader in open source messaging and collaboration

Paul B. Henson

12 Jan 12 Jan

2:18 a.m.

On Sat, Jan 11, 2014 at 12:45:13PM -0800, Quanah Gibson-Mount wrote:

...

Yes, this is generally what I see when re-running massive changes. There is a one time growth jump, and then it stabilizes.

Interestingly, the massive growth is only seen on the master (fosse-dev in the list below), the other boxes, which received the exact same changes but via syncrepl rather than via a connected client, barely grew at all:

fosse-dev 1.9G /var/lib/openldap-data/data.mdb filmore-dev 948M /var/lib/openldap-data/data.mdb pip-dev 952M /var/lib/openldap-data/data.mdb shelley-dev 932M /var/lib/openldap-data/data.mdb

Ulrich Windl

13 Jan 13 Jan

7:45 a.m.

New subject: Antw: Re: mdb expected growth

...

...
...
Quanah Gibson-Mount quanah@zimbra.com schrieb am 11.01.2014 um 21:45 in

Nachricht <3E33AD8DB84666763FC3398E@[192.168.1.2]>:

...

--On Saturday, January 11, 2014 11:21 AM -0800 "Paul B. Henson" henson@acm.org wrote:

...
On Fri, Jan 10, 2014 at 01:00:32PM -0800, Paul B. Henson wrote:

...
I then proceeded to run a test load on it (basically, I had a script I put together when I added the memberof overlay that ripped through all of our groups, removing and then re-adding all members). After this test run, the use jumped to 1.9G:

Hmm, well, I reran the same test script; there was no drastic jump in utilization, but it did increase from 479536 pages used to 492848, or 52M additional space used for storing the same content.

Yes, this is generally what I see when re-running massive changes. There is a one time growth jump, and then it stabilizes.

It would be interesting to see the quotient of "size of mdb" / "size of the database in slapcat format". OK, this ignores any indexes that will also consume some space...

...

--Quanah

--

Quanah Gibson-Mount Architect - Server Zimbra, Inc.

Zimbra :: the leader in open source messaging and collaboration

Hallvard Breien Furuseth

5:08 p.m.

New subject: Antw: Re: mdb expected growth

Ulrich Windl writes:

...

It would be interesting to see the quotient of "size of mdb" / "size of the database in slapcat format".

Can't tell size in slapcat format from the MDB database, but we can look at the sub-databases ad2i + dn2i + id2e which make up the unindexed slapd data, and compare with "Number of pages used". See the "mdb_stat -e -a" output.

Maybe mdb_stat should accept several '-s subname' arguments if someone would rather see just the interesting DBs instead of -a(ll).

...

OK, this ignores any indexes that will also consume some space...

In mdb, data and indexes are all in the same datafile.

-- Hallvard

Howard Chu

12 Jan 12 Jan

9:40 p.m.

Paul B. Henson wrote:

...

So after updating our dev ldap environment to use mdb, and slapadd'ing a fresh copy from our production environment into it, the database was using 828M:

Environment Info Map address: (nil) Map size: 2147483648 Page size: 4096 Max pages: 524288 Number of pages used: 211612 Last transaction ID: 1333 Max readers: 126 Number of readers used: 6 Status of Main DB Tree depth: 1 Branch pages: 0 Leaf pages: 1 Overflow pages: 0 Entries: 29

I then proceeded to run a test load on it (basically, I had a script I put together when I added the memberof overlay that ripped through all of our groups, removing and then re-adding all members). After this test run, the use jumped to 1.9G:

Environment Info Map address: (nil) Map size: 2147483648 Page size: 4096 Max pages: 524288 Number of pages used: 479536 Last transaction ID: 1075380 Max readers: 126 Number of readers used: 11 Status of Main DB Tree depth: 1 Branch pages: 0 Leaf pages: 1 Overflow pages: 0 Entries: 29

There is no new data in the database, all that happened was existing data was removed and then re-added. Is this drastic increase in space utilization expected in such a scenario?

From the sound of your quite vague test description, sure. As it states in the LMDB doc, long-lived reader transactions prevent reuse of freed pages. http://symas.com/mdb/doc/

You have a long search operation running, and are issuing writes while the search is in progress.

-- -- Howard Chu CTO, Symas Corp. http://www.symas.com Director, Highland Sun http://highlandsun.com/hyc/ Chief Architect, OpenLDAP http://www.openldap.org/project/

Paul B. Henson

13 Jan 13 Jan

8:53 p.m.

...

From: Howard Chu [mailto:hyc@symas.com] Sent: Sunday, January 12, 2014 1:41 PM

From the sound of your quite vague test description, sure. As it states

...

the LMDB doc, long-lived reader transactions prevent reuse of freed pages. http://symas.com/mdb/doc/

You have a long search operation running, and are issuing writes while the search is in progress.

Hmm, I don't think that is the case. The way the script works is that it first generates a list of all of our groups (which doesn't even come from LDAP, it gets pulled out of a database that is another component of our idm infrastructure). Then, for each group, it does a search to get all the members, and then removes/adds them 500 at a time until it has processed all of them. So there is one search, then repeated modify remove/modify add, then another completely separate search, etc, through all of them. The connection to the LDAP server is persistent throughout the script, but there is no single operation whose duration persists.

Does it have anything to do with being a syncrepl provider? The documentation sometimes describes syncrepl as being a "persistent search", and the size increase only occurred on the primary master, not the secondary master or read-only slaves.

Howard Chu

9:01 p.m.

Paul B. Henson wrote:

...

...
From: Howard Chu [mailto:hyc@symas.com] Sent: Sunday, January 12, 2014 1:41 PM

From the sound of your quite vague test description, sure. As it states

in

...
the LMDB doc, long-lived reader transactions prevent reuse of freed pages. http://symas.com/mdb/doc/

You have a long search operation running, and are issuing writes while the search is in progress.

Hmm, I don't think that is the case. The way the script works is that it first generates a list of all of our groups (which doesn't even come from LDAP, it gets pulled out of a database that is another component of our idm infrastructure). Then, for each group, it does a search to get all the members, and then removes/adds them 500 at a time until it has processed all of them. So there is one search, then repeated modify remove/modify add, then another completely separate search, etc, through all of them. The connection to the LDAP server is persistent throughout the script, but there is no single operation whose duration persists.

Does it have anything to do with being a syncrepl provider? The documentation sometimes describes syncrepl as being a "persistent search", and the size increase only occurred on the primary master, not the secondary master or read-only slaves.

The syncprov persistent search implementation doesn't result in a long-lived read transaction. You can use mdb_stat -r to see what's going on as far as read transactions in the DB.

-- -- Howard Chu CTO, Symas Corp. http://www.symas.com Director, Highland Sun http://highlandsun.com/hyc/ Chief Architect, OpenLDAP http://www.openldap.org/project/

4335

Age (days ago)

4338

Last active (days ago)

openldap-technical@openldap.org

8 comments

5 participants

tags (0)

participants (5)

Hallvard Breien Furuseth
Howard Chu
Paul B. Henson
Quanah Gibson-Mount
Ulrich Windl