Hi,
I am using default openldap-server-2.3.27-5 coming with red hat 5 and I am using syncrep method to replicate with master ldap server. The replication itself worked well, was able to replicate a large database. but I have a severe problem that slapd process spikes high CPU every few seconds. User time is about 85%. The whole server gets very very slow.
I turned slapd to debug level -1 and grep for error. The only error i got was: ----------- -Oct 16 17:32:59 ldapserver slapd[3127]: => access_allowed: backend default search access granted to "(anonymous)" Oct 16 17:32:59 ldapserver slapd[3127]: <= test_filter 5 Oct 16 17:32:59 ldapserver slapd[3127]: <= test_filter_and 5 Oct 16 17:32:59 ldapserver slapd[3127]: <= test_filter 5 Oct 16 17:32:59 ldapserver slapd[3127]: bdb_search: 10555 does not match filter Oct 16 17:32:56 ldapserver slapd[3127]: ber_get_next on fd 24 failed errno=0 (Success) Oct 16 17:32:59 ldapserver slapd[3127]: connection_read(24): input error=-2 id=14, closing.
Oct 16 17:35:24 ldapserver slapd[3127]: connection_read(17): input error=-2 id=32, closing. Oct 16 17:35:29 ldapserver slapd[3127]: connection_read(18): input error=-2 id=36, closing. Oct 16 17:35:32 ldapserver slapd[3127]: connection_read(18): input error=-2 id=37, closing. ---------- (I do not know if this is related with high CPU), there was no particular messages in /var/log/message. I have 3.2GHz CPU and 2G memory on this server.
Has anybody had the similar problem before? Can anyone advice what can be done in order to troubleshoot further?
Thanks.
Angie
If you're not sure if those logs are related, perhaps you should run "date" when things get slow and find the relevant debug logs that way?
I didn't see any smoking gun in what you posted, but since you're posting about "slow" with a log describing filter processing I'd look into unindexed searches. You're using syncrepl, so hopefully you read in the slapo-syncprov man page
On databases that support inequality indexing, it is helpful to set an eq index on the entryCSN attribute when using this overlay.
and followed that advice? If you added indices, did you run slapindex after stopping slapd? Look for messages along the lines of
<= bdb_equality_candidates: (FOO) not indexed
in your logs.
I had an eq index set on the entryCSN. Weird part is since syslogd will be very busy at logging slapd (even using more cpu than slapd itself) with "loglevel -1" in slapd.conf, so I run "slapd -d127 -u ldap -h ldap:/// ldaps:///" trying to log more info, it works like a charm!
Any ideas?
Angie
On 10/16/07, Aaron Richton richton@nbcs.rutgers.edu wrote:
If you're not sure if those logs are related, perhaps you should run "date" when things get slow and find the relevant debug logs that way?
I didn't see any smoking gun in what you posted, but since you're posting about "slow" with a log describing filter processing I'd look into unindexed searches. You're using syncrepl, so hopefully you read in the slapo-syncprov man page
On databases that support inequality indexing, it is helpful to set an eq index on the entryCSN attribute when using this overlay.
and followed that advice? If you added indices, did you run slapindex after stopping slapd? Look for messages along the lines of
<= bdb_equality_candidates: (FOO) not indexed
in your logs.
You mean to say that debug to the console is cheaper than debug to syslog()? That's quite likely, especially since many syslog implementations sync on each message (very expensive). Look into syslog()ing remotely, and/or using the "-" syslog.conf option if it is available (Linux in particular). And of course -1 is a REALLY expensive debug (almost certainly more than you need in this case); dumping out packet encoding takes a lot of time.
On Tue, 16 Oct 2007, Angie Cao wrote:
I had an eq index set on the entryCSN. Weird part is since syslogd will be very busy at logging slapd (even using more cpu than slapd itself) with "loglevel -1" in slapd.conf, so I run "slapd -d127 -u ldap -h ldap:/// ldaps:///" trying to log more info, it works like a charm!
Any ideas?
Angie
On 10/16/07, Aaron Richton richton@nbcs.rutgers.edu wrote:
If you're not sure if those logs are related, perhaps you should run "date" when things get slow and find the relevant debug logs that way?
I didn't see any smoking gun in what you posted, but since you're posting about "slow" with a log describing filter processing I'd look into unindexed searches. You're using syncrepl, so hopefully you read in the slapo-syncprov man page
On databases that support inequality indexing, it is helpful to set an eq index on the entryCSN attribute when using this overlay.
and followed that advice? If you added indices, did you run slapindex after stopping slapd? Look for messages along the lines of
<= bdb_equality_candidates: (FOO) not indexed
in your logs.
Angie Cao wrote:
I had an eq index set on the entryCSN. Weird part is since syslogd will be very busy at logging slapd (even using more cpu than slapd itself) with "loglevel -1" in slapd.conf, so I run "slapd -d127 -u ldap -h ldap:/// ldaps:///" trying to log more info, it works like a charm!
Instead of just **trying** to log more info, why don't you just read the man page and learn what loglevels mean? First of all, since OpenLDAP 2.3 you can specify user-friendly names for what you want to log. Second, the value of the log level is a bitmask, and 127 corresponds to 1+2+4+8+16+32+64, namely "trace,packets,args,conns,ber,filter,config"; are you sure you need all of them? Are you sure you don't need, say, "stats", which is perhaps the only loglevel worth when at a loss, to spot where one needs to look deeper into?
p.
Ing. Pierangelo Masarati OpenLDAP Core Team
SysNet s.r.l. via Dossi, 8 - 27100 Pavia - ITALIA http://www.sys-net.it --------------------------------------- Office: +39 02 23998309 Mobile: +39 333 4963172 Email: pierangelo.masarati@sys-net.it ---------------------------------------
Angie Cao skrev, on 17-10-2007 00:58:
I am using default openldap-server-2.3.27-5 coming with red hat 5 and I am using syncrep method to replicate with master ldap server. The replication itself worked well, was able to replicate a large database. but I have a severe problem that slapd process spikes high CPU every few seconds. User time is about 85%. The whole server gets very very slow.
Experience (years of) plus own testing with RHEL5 says: Do Not Use Red Hat's OpenLDAP, use Buchan Milnes'. My site's totally dependent on OpenLDAP on multiple servers, has no problems. Red Hat's OL is (presumably) patched to use the native BDB 4.3.29, whereas Buchan's has its own discrete patched 4.2.52. At least one developer on this list has written that the OL 2.3 source was deliberately written not to work with BDB 43, go figure.
http://staff.telkomsa.net/packages/
--Tonni
--
I turned slapd to debug level -1 and grep for error. The only error i got was:
-Oct 16 17:32:59 ldapserver slapd[3127]: => access_allowed: backend default search access granted to "(anonymous)" Oct 16 17:32:59 ldapserver slapd[3127]: <= test_filter 5 Oct 16 17:32:59 ldapserver slapd[3127]: <= test_filter_and 5 Oct 16 17:32:59 ldapserver slapd[3127]: <= test_filter 5 Oct 16 17:32:59 ldapserver slapd[3127]: bdb_search: 10555 does not match filter Oct 16 17:32:56 ldapserver slapd[3127]: ber_get_next on fd 24 failed errno=0 (Success) Oct 16 17:32:59 ldapserver slapd[3127]: connection_read(24): input error=-2 id=14, closing.
Oct 16 17:35:24 ldapserver slapd[3127]: connection_read(17): input error=-2 id=32, closing. Oct 16 17:35:29 ldapserver slapd[3127]: connection_read(18): input error=-2 id=36, closing. Oct 16 17:35:32 ldapserver slapd[3127]: connection_read(18): input error=-2 id=37, closing.
(I do not know if this is related with high CPU), there was no particular messages in /var/log/message. I have 3.2GHz CPU and 2G memory on this server.
Has anybody had the similar problem before? Can anyone advice what can be done in order to troubleshoot further?
Thanks.
Angie
--On Wednesday, October 17, 2007 4:07 AM +0200 Tony Earnshaw tonni@hetnet.nl wrote:
Experience (years of) plus own testing with RHEL5 says: Do Not Use Red Hat's OpenLDAP, use Buchan Milnes'. My site's totally dependent on OpenLDAP on multiple servers, has no problems. Red Hat's OL is (presumably) patched to use the native BDB 4.3.29, whereas Buchan's has its own discrete patched 4.2.52. At least one developer on this list has written that the OL 2.3 source was deliberately written not to work with BDB 43, go figure.
No, you are misreading. BDB 4.3 has been extremely flaky, so the OpenLDAP *configure script* has been made to abort if you try to build it against BDB 4.3. There's nothing specific in OpenLDAP's code that prevents it from working with BDB 4.3, it is all the problems in BDB 4.3 that prevents things from working. ;) Some distributions have *hacked* the OpenLDAP configure script so it will build against BDB 4.3, and there's nothing we can do about that.
--Quanah
--
Quanah Gibson-Mount Principal Software Engineer Zimbra, Inc -------------------- Zimbra :: the leader in open source messaging and collaboration
openldap-software@openldap.org