Statistically, that should be relevant. I mean, I usually do.
i=0; while [ $i -lt 100 ]; do pstack <MYPID> > pstack.$i; (( i+=1 )); done;
Yes no sleep, just a burst of pstacks. That is statistically as correct as any sampling based profilers would tell, without the complexity of having to install
one such tool (kernel prereq, etc…) and you can collect that in less than a minute.
Sometimes though that can considered as hard to read for people not used to it.
If you pass me with your output, I may try to help.
Best Regards
++Cyrille
From: Luca Polidoro [mailto:luca.polidoro@gmail.com]
Sent: Friday, September 06, 2013 3:08 PM
To: Maucci, Cyrille
Cc: openldap-technical@openldap.org
Subject: Re: Slapd High CPU usage on Solaris 9
Hi, I have already done these tests, but the result provides little information, none of which is useful for directing the analysis.
2013/9/6 Maucci, Cyrille <cyrille.maucci@hp.com>
When I myself face such a problem, I usually pstack the process a few times to very quickly know
what the guy is doing.
And that usually gives me a good clue.
++Cyrille
From:
openldap-technical-bounces@OpenLDAP.org [mailto:openldap-technical-bounces@OpenLDAP.org]
On Behalf Of Luca Polidoro
Sent: Monday, August 12, 2013 3:31 PM
To: openldap-technical@openldap.org
Subject: Slapd High CPU usage on Solaris 9
Hello,
I am writing to to submit a case that has been happening in the last 2 weeks in our infrastructure. This is structured as follows:
1 provider: Solaris 9 SPARC - Sun Fire V490 - last OS patch level
CPU: 4-1500 Mhz
RAM: 32 GB
OpenLDAP version used: Berkeley DB 2.4.23 and 4.8.30 (with database bdb) all 64-bit
18 consumer: Solaris 9 SPARC - last OS patch level with different types of features (CPU, RAM)
On the following consumer products:
Consumer 1: Solaris 9 SPARC - Sun Fire 480R - last OS patch level
CPU: 4-900 Mhz
RAM: 8 GB
Consumer 2: Solaris 9 SPARC - Sun Fire 480R - last OS patch level
CPU: 4-1050 Mhz
RAM: 8 GB
Consumer 3: Solaris 9 SPARC - Sun Fire 480R - last OS patch level
CPU: 4-1050 Mhz
RAM: 8 GB
Consumer 4: Solaris 9 SPARC - Sun Fire V210 - last OS patch level
CPU: 2-1336 Mhz
RAM: 8 GB
we are noticing an increase in the cpu used by the slapd process. In fact, the process is constantly between 85% and 95%, and became completely unusable and then we are forced to restart.
LDAP with 1.000.000 objects.
This is the consumer's slapd.conf (I have omitted parts of the ACL, includes, etc..):
# See slapd.conf(5) for details on configuration options.
# This file should NOT be world readable.
#
#
# VERSION v2 - Digital Tru64
#
allow bind_v2
Some include
...
#
# tuning parameters - START
# ------------------------------
#
conn_max_pending 1000
conn_max_pending_auth 1000
idletimeout 500
sizelimit unlimited
threads 8
timelimit 500
disallow bind_anon
#
# tuning parameters - END
# ----------------------------
#
...
#######################################################################
# bdb database definitions
#######################################################################
database bdb
suffix "xxxxxxxxxxxx"
rootdn "cn=root,ou=ldapusers,xxxxx"
directory /var/openldap-2.4.23_64/var/openldap-data
#####disallow limit for syncuser
limits dn.children="ou=syncusers,xxxx" size=unlimited
index objectClass,entryCSN,entryUUID eq
index ou eq,sub,subinitial,subany,subfinal
index uidOwner eq
index uid eq
index memberUid eq
#shm_key 1100
cachesize 1000000
cachefree 10000
dncachesize 1000000
idlcachesize 1000000
searchstack 16
checkpoint 1024 10
overlay ppolicy
ppolicy_default "cn=Standard,ou=Policies,xxxx"
ppolicy_use_lockout
############################SYNCREPL CONF
syncrepl rid=011
provider=ldap://xxxxxx
type=refreshAndPersist
interval=00:00:15:00
retry="15 10 120 +"
searchbase="xxxxx"
filter="(objectClass=*)"
attrs="*,+"
scope=sub
schemachecking=on
bindmethod=simple
binddn="xxxxxx"
credentials=xxxx
############################SYNCREPL CONF
These are the bdb files:
420M dn2id.bdb
30M entryCSN.bdb
32M entryUUID.bdb
1,4G id2entry.bdb
18M memberUid.bdb
4,9M objectClass.bdb
5,3M ou.bdb
17M uid.bdb
17M uidOwner.bdb
this is DB CONFIG:
-----------------------------------------------------------
##########################################
###########################################
#set_cachesize 0 300000000 10
#set_lg_regionmax 262144
#set_lg_bsize 2097152
###########################################
###########################################
# replaces lockdetect directive
#set_lk_detect DB_LOCK_EXPIRE
set_lk_detect DB_LOCK_DEFAULT
# uncomment if dbnosync required
#AGGIUNTO TUTTO
#set_flags DB_TXN_WRITE_NOSYNC
####AGGIUNTO
set_flags DB_LOG_AUTOREMOVE
# multiple set_flags directives allowed
# sets max log size = 5M (BDB default=10M)
set_lg_max 25242880
set_lg_dir /var/openldap-2.4.23_64/logs
set_cachesize 2 274726912 1
# sets a database cache of 5M and
# allows fragmentation
# does NOT replace slapd.conf cachesize
# this is a database parameter
#txn_checkpoint 128 15 0
# replaces checkpoint in slap.conf
# writes checkpoint if 128K written or every 15 mins
# 0 = no writes - no update
set_lk_max_locks 2500
set_lk_max_lockers 2500
set_lk_max_objects 2500
---------------------------------------------------
We have tried to change the number of threads bringing them to 16, we lowered the parameters idletimeout and timelimit, but without result.
Appreciate your feedback.
Thanks,
Luca