openldap getting very slow

List overview All Threads
Download

newer

older

OpenLDAP Password Encryption

slapd-ldap and authentication

Thomas van Oudenhove

6 Apr 2009 6 Apr '09

1:45 a.m.

hello,

I have some problems with my ldap server. it worked perfectly until monday, march 30 : on every night, I used to extract all entries from the server, making a diff with some ldif file(s) and ldapmodify with the "diff ldif file".

there was about 15,000 to 40,000 entries modified each night, and ldapmodify was running about 20 to 40 minutes (~1,000 entries per minute).

on last monday, the extraction of ldap had some trouble, the extraction request never ended, and that made the server inaccessible; we had to restart it. since march 30, the update process (ldapmodify) is incredibly slow, it takes about several hours (~60 entries per minute), but the extraction request is "normal" again (about 2 seconds).

we tried to restart from scratch (delete db, re-install openldap (and update) and start the process again, it is still slower than before (~400 entries per minute). we cannot figure out why it became so slow, so we are interested in any advice... we had openldap 2.3.27-8.el5_2.4 installed and tried with the update: openldap 2.3.43-3.el5 (both CentOS packages).

log of the failed request: Mar 30 01:10:01 ldap slapd[27383]: conn=1772 fd=16 ACCEPT from IP=xxx.xxx.xxx.xxx:38263 (IP=0.0.0.0:389) Mar 30 01:10:01 ldap slapd[27383]: conn=1772 op=0 BIND dn="cn=Manager,dc=fr" method=128 Mar 30 01:10:01 ldap slapd[27383]: conn=1772 op=0 BIND dn="cn=Manager,dc=fr" mech=SIMPLE ssf=0 Mar 30 01:10:01 ldap slapd[27383]: conn=1772 op=0 RESULT tag=97 err=0 text= Mar 30 01:10:01 ldap slapd[27383]: conn=1772 op=1 SRCH base="dc=fr" scope=2 deref=0 filter="(objectClass=*)" [and no more traces until openldap restart about noon]

log of a "normal" request: Mar 16 01:10:01 ldap slapd[23294]: conn=11 fd=15 ACCEPT from IP=xxx.xxx.xxx.xxx:53535 (IP=0.0.0.0:389) Mar 16 01:10:01 ldap slapd[23294]: conn=11 op=0 BIND dn="cn=Manager,dc=fr" method=128 Mar 16 01:10:01 ldap slapd[23294]: conn=11 op=0 BIND dn="cn=Manager,dc=fr" mech=SIMPLE ssf=0 Mar 16 01:10:01 ldap slapd[23294]: conn=11 op=0 RESULT tag=97 err=0 text= Mar 16 01:10:01 ldap slapd[23294]: conn=11 op=1 SRCH base="dc=fr" scope=2 deref=0 filter="(objectClass=*)" Mar 16 01:10:03 ldap slapd[23294]: conn=11 op=1 SEARCH RESULT tag=101 err=0 nentries=39741 text= Mar 16 01:10:03 ldap slapd[23294]: conn=11 op=2 UNBIND Mar 16 01:10:03 ldap slapd[23294]: conn=11 fd=15 closed

thanks for any help, regards,

-- Thomas van Oudenhove - Université de Toulouse tél: (+33) 5 61 36 60 45 jabberID: thomasvo@im.apinc.org

Show replies by date

Quanah Gibson-Mount

6 Apr 6 Apr

8:13 a.m.

--On Monday, April 06, 2009 10:45 AM +0200 Thomas van Oudenhove vanouden@univ-toulouse.fr wrote:

...

thanks for any help,

Have you looked at your DB_CONFIG settings? Locks, lockers, lock objects? cachesize in relation to DB size? etc.

--Quanah

Quanah Gibson-Mount Principal Software Engineer Zimbra, Inc -------------------- Zimbra :: the leader in open source messaging and collaboration

Thomas van Oudenhove

8:36 a.m.

hi,

Quanah Gibson-Mount a écrit :

...

Have you looked at your DB_CONFIG settings? Locks, lockers, lock objects? cachesize in relation to DB size? etc.

here are my DB_CONFIG settings : set_cachesize 0 268435456 1 set_lg_regionmax 262144 set_lg_bsize 2097152 set_flags DB_LOG_AUTOREMOVE

I've added : checkpoint 128 15 to my slapd.conf and the larger of my __db* files is 327688 kB

do you think I must set a cachesize larger ? (than what ? )

but the point is that the whole thing worked perfectly during one month and a half, and suddenly crashed... and I cannot go back to the previous performance, even when I completely delete and rebuild the bdb... even if I need to make the thing work, I'd like to understand what happened...

thanks, regards,

-- Thomas van Oudenhove - Université de Toulouse tél: (+33) 5 61 36 60 45 jabberID: thomasvo@im.apinc.org

Quanah Gibson-Mount

9:11 a.m.

--On Monday, April 06, 2009 5:36 PM +0200 Thomas van Oudenhove vanouden@univ-toulouse.fr wrote:

...

hi,

Quanah Gibson-Mount a écrit :

...
Have you looked at your DB_CONFIG settings? Locks, lockers, lock objects? cachesize in relation to DB size? etc.

here are my DB_CONFIG settings : set_cachesize 0 268435456 1 set_lg_regionmax 262144 set_lg_bsize 2097152 set_flags DB_LOG_AUTOREMOVE

I've added : checkpoint 128 15 to my slapd.conf and the larger of my __db* files is 327688 kB

do you think I must set a cachesize larger ? (than what ? )

but the point is that the whole thing worked perfectly during one month and a half, and suddenly crashed... and I cannot go back to the previous performance, even when I completely delete and rebuild the bdb... even if I need to make the thing work, I'd like to understand what happened...

Hi Thomas,

Since I don't know the particulars of your database, I don't know whether or not your database configuration is sufficient for your system. I'd suggest reading up on proper DB tuning, and the parameters for tuning slapd (see the slapd-bdb(5) man page). In particular on the BDB side:

What's the total size of your database? (du -c -h *.bdb) How many locks, lockers and lock objects are you using? (You'll need to find the right db_stat binary for your database, and then use the -c option)

How many total entries are in your database? What is your cachesize setting in slapd.conf in relation to that? What's your idlcachesize setting? etc.

--Quanah

Quanah Gibson-Mount Principal Software Engineer Zimbra, Inc -------------------- Zimbra :: the leader in open source messaging and collaboration

Thomas van Oudenhove

7 Apr 7 Apr

2:36 a.m.

hi,

Quanah Gibson-Mount a écrit :

...

Since I don't know the particulars of your database, I don't know whether or not your database configuration is sufficient for your system. I'd suggest reading up on proper DB tuning, and the parameters for tuning slapd (see the slapd-bdb(5) man page). In particular on the BDB side:

for sure; I confess I do not know BDB as well as I should :( I started with some BDB "man pages", but it's still a long way...

does a "formula" exists to calculate "good" settings, knowing the size and number of entries of the database ?

...

What's the total size of your database? (du -c -h *.bdb) How many locks, lockers and lock objects are you using? (You'll need to find the right db_stat binary for your database, and then use the -c option)

the total size is: # du -c -h *.bdb [...] 136M total

and db_stat outputs: # slapd_db_stat -c 189 Last allocated locker ID 0x7fffffff Current maximum unused locker ID 9 Number of lock modes 1000 Maximum number of locks possible 1000 Maximum number of lockers possible 1000 Maximum number of lock objects possible 37 Number of current locks 349 Maximum number of locks at any one time 60 Number of current lockers 63 Maximum number of lockers at any one time 37 Number of current lock objects 183 Maximum number of lock objects at any one time 40M Total number of locks requested (40770521) 40M Total number of locks released (40770436) 0 Total number of locks upgraded 39 Total number of locks downgraded 0 Lock requests not available due to conflicts, for which we waited 0 Lock requests not available due to conflicts, for which we did not wait 0 Number of deadlocks 0 Lock timeout value 0 Number of locks that have timed out 0 Transaction timeout value 0 Number of transactions that have timed out 544KB The size of the lock region 0 The number of region locks that required waiting (0%)

...

How many total entries are in your database? What is your cachesize setting in slapd.conf in relation to that? What's your idlcachesize setting? etc.

we have some 38,185 entries in the database (between 37,000 and 40,000, depending on the days...), and the cachesize is: # grep cachesize DB_CONFIG set_cachesize 0 268435456 1

however, I do not have an "idlecachesize" setting, neither in DB_CONFIG, nor in slapd.conf...

I just noticed I left the "allow bind_v2" directive in slapd.conf, could it be the cause of bad performance ?

thank you for all help provided, regards,

-- Thomas van Oudenhove - Université de Toulouse tél: (+33) 5 61 36 60 45 jabberID: thomasvo@im.apinc.org

Oskar Pearson

9:38 a.m.

On 7 Apr 2009, at 10:36, Thomas van Oudenhove wrote:

...

hi,

Quanah Gibson-Mount a écrit :

...
Since I don't know the particulars of your database, I don't know whether or not your database configuration is sufficient for your system. I'd suggest reading up on proper DB tuning, and the parameters for tuning slapd (see the slapd-bdb(5) man page). In particular on the BDB side:

for sure; I confess I do not know BDB as well as I should :( I started with some BDB "man pages", but it's still a long way...

This may help? http://www.openldap.org/faq/data/cache/1075.html

Aside from LDAP - what do "vmstat 1" and "iostat -x 1" say when this is happening? Is it possible the box is swapping? Is it disk-seek constrained? CPU constrained? The answers to those questions will help point in the right direction.

Oskar

Thomas van Oudenhove

9 Apr 9 Apr

11:45 p.m.

hi,

I just answer to Oskar, but also want to thank you all; I restarted a new server from scratch once again, with some 'slapindex' after the first feeding (seems to be important), and it works again.

I improved also my logrotate script to restart and slapindex my server... If it's not a good idea, please let me know...

Oskar Pearson a écrit :

...

This may help? http://www.openldap.org/faq/data/cache/1075.html

sure; I found this page weeks ago and could not retrieve it, thanks.

again, thank you all. regards,

-- Thomas van Oudenhove - Université de Toulouse tél: (+33) 5 61 36 60 45 jabberID: thomasvo@im.apinc.org

Quanah Gibson-Mount

7 Apr 7 Apr

12:25 p.m.

--On Tuesday, April 07, 2009 11:36 AM +0200 Thomas van Oudenhove vanouden@univ-toulouse.fr wrote:

...

the total size is: # du -c -h *.bdb [...] 136M total

Your lock/locker/lock object settings look fine.

...

we have some 38,185 entries in the database (between 37,000 and 40,000, depending on the days...), and the cachesize is: # grep cachesize DB_CONFIG set_cachesize 0 268435456 1

Your BDB db cachesize is 256MB then, which is more than sufficient for your 136MB database.

...

however, I do not have an "idlecachesize" setting, neither in DB_CONFIG, nor in slapd.conf...

However, you missed my point about the "cachesize" setting in slapd.conf (NOT DB_CONFIG). You should have a cachesize setting in slapd.conf (of probably 45000 and a similar idlcachesize setting in slapd.conf as well).

...

I just noticed I left the "allow bind_v2" directive in slapd.conf, could it be the cause of bad performance ?

Unlikely. How exactly are you measuring performance?

--Quanah

Quanah Gibson-Mount Principal Software Engineer Zimbra, Inc -------------------- Zimbra :: the leader in open source messaging and collaboration

Sean O'Malley

6 Apr 6 Apr

8:56 a.m.

On Mon, 6 Apr 2009, Thomas van Oudenhove wrote:

...

hello,

I have some problems with my ldap server. it worked perfectly until monday, march 30 : on every night, I used to extract all entries from the server, making a diff with some ldif file(s) and ldapmodify with the "diff ldif file".

there was about 15,000 to 40,000 entries modified each night, and ldapmodify was running about 20 to 40 minutes (~1,000 entries per minute).

on last monday, the extraction of ldap had some trouble, the extraction request never ended, and that made the server inaccessible; we had to restart it. since march 30, the update process (ldapmodify) is incredibly slow, it takes about several hours (~60 entries per minute), but the extraction request is "normal" again (about 2 seconds).

we tried to restart from scratch (delete db, re-install openldap (and update) and start the process again, it is still slower than before (~400 entries per minute). we cannot figure out why it became so slow, so we are interested in any advice... we had openldap 2.3.27-8.el5_2.4 installed and tried with the update: openldap 2.3.43-3.el5 (both CentOS packages).

log of the failed request: Mar 30 01:10:01 ldap slapd[27383]: conn=1772 fd=16 ACCEPT from IP=xxx.xxx.xxx.xxx:38263 (IP=0.0.0.0:389) Mar 30 01:10:01 ldap slapd[27383]: conn=1772 op=0 BIND dn="cn=Manager,dc=fr" method=128 Mar 30 01:10:01 ldap slapd[27383]: conn=1772 op=0 BIND dn="cn=Manager,dc=fr" mech=SIMPLE ssf=0 Mar 30 01:10:01 ldap slapd[27383]: conn=1772 op=0 RESULT tag=97 err=0 text= Mar 30 01:10:01 ldap slapd[27383]: conn=1772 op=1 SRCH base="dc=fr" scope=2 deref=0 filter="(objectClass=*)" [and no more traces until openldap restart about noon]

log of a "normal" request: Mar 16 01:10:01 ldap slapd[23294]: conn=11 fd=15 ACCEPT from IP=xxx.xxx.xxx.xxx:53535 (IP=0.0.0.0:389) Mar 16 01:10:01 ldap slapd[23294]: conn=11 op=0 BIND dn="cn=Manager,dc=fr" method=128 Mar 16 01:10:01 ldap slapd[23294]: conn=11 op=0 BIND dn="cn=Manager,dc=fr" mech=SIMPLE ssf=0 Mar 16 01:10:01 ldap slapd[23294]: conn=11 op=0 RESULT tag=97 err=0 text= Mar 16 01:10:01 ldap slapd[23294]: conn=11 op=1 SRCH base="dc=fr" scope=2 deref=0 filter="(objectClass=*)" Mar 16 01:10:03 ldap slapd[23294]: conn=11 op=1 SEARCH RESULT tag=101 err=0 nentries=39741 text= Mar 16 01:10:03 ldap slapd[23294]: conn=11 op=2 UNBIND Mar 16 01:10:03 ldap slapd[23294]: conn=11 fd=15 closed

thanks for any help, regards,

Did you try to reindex it? I have had similar issues with MUCH earlier versions after 4 months or so having them flake out.

I assume you did, but you also need to have defined a checkpoint defined in your slapd.conf which I do not believe is default or else it gets messy.

Andrew Findlay

9 Apr 9 Apr

7 a.m.

On Mon, Apr 06, 2009 at 10:45:32AM +0200, Thomas van Oudenhove wrote:

...

on last monday, the extraction of ldap had some trouble, the extraction request never ended, and that made the server inaccessible; we had to restart it. since march 30, the update process (ldapmodify) is incredibly slow, it takes about several hours (~60 entries per minute), but the extraction request is "normal" again (about 2 seconds).

Are you sure that the problem is in OpenLDAP? Have you checked the server hardware?

This sort of heavy slowdown could be caused by a failing disk for example.

Andrew

-- ----------------------------------------------------------------------- | From Andrew Findlay, Skills 1st Ltd | | Consultant in large-scale systems, networks, and directory services | | http://www.skills-1st.co.uk/ +44 1628 782565 | -----------------------------------------------------------------------

5937

Age (days ago)

5941

Last active (days ago)

openldap-software@openldap.org

9 comments

5 participants

tags (0)

participants (5)

Andrew Findlay
Oskar Pearson
Quanah Gibson-Mount
Sean O'Malley
Thomas van Oudenhove