Mysterious slow-down of slapd with hdb - openldap-software

27 Feb 2007


      Hello!
I am trying to use openLDAP to hold a small but continuously rebuilt
database with a hdb backend. Basically I build a directory under a
temporary node and move it into place when its ready (hence the hdb, I
want to move the hole tree into place in one go). I build a new
directory, move out the current to an other temporary node and move in
the new one. Lastly I delete the defunct tree and start over building a
new tree. In short the idea is to always (except I suppose between
moving the tree out and the new in, but I don't see any solution for
that) have a complete tree in place while continuously trying to have it
updated.
This thing works well for a couple of hours on the machine I am running
it (PIII 1 cpu, 1000MHz, 512 Mb ram, linux 2.6 kernel), but then slows
down by a factor 10-20.
Why is this and what can I do to stop it? (easy to ask...)
free shows:
# free
             total       used       free     shared    buffers     cached
Mem:        508104     501992       6112          0      31268     332556
-/+ buffers/cache:     138168     369936
Swap:      2104504       2904    2101600
This isn't brilliant of course but AFAIU not catastrophic either. I have
about the same when it isn't slowed down. vmstat with a sampling rate of
a few seconds show no swapping before or after slapd slows down.
top shows that slapd and the script populating it runs at about 2-3%
each and not much cpu consumption apart from that (consistent with a
system that slows down a factor 20 I guess). The script uses Net::LDAP
in perl (over local socket) so no external clients are invoked.
The really puzzling bit is that if a shut down slapd and the "directory
builder" - thus reclaiming memory and filedescriptors and such to the
system - and then restart them I almost immediately get the same slow
down. In fact, the time it takes to get the computer to slow down after
firing up slapd seems proportional to how long I let it "rest".
I have tried all sorts of things to analyze this and finally decided to
profile slapd. I rebuilt it with -g -pg in CFLAGS and --enable-debug to
configure (actually I have used that switch all along). I also
discovered that I had to replace 'strip = -s' with 'strip =' in all
makefiles even though --enable-debug was given (is this intentional or a
bug in configure?). Finally I had to get the gprof-helper (and confirm
that it was used) by Hocevar/Jönsson to be able to profile threaded
applications. The result doesn't however tell me much. The slapd process
seems to spend most (70-80%) of its time in the "at_next" routine.
Info about system:
I am running v2.3.24 of slapd.
built with:
./configure --program-prefix=jj4 --with-threads=yes --enable-dynamic
--enable-debug --enable-crypt --enable-lmpasswd --enable-spasswd --enable-
modules --enable-backends=mod --enable-sql=no --enable-ldap=mod
--enable-meta=mod --enable-monitor=mod --enable-null=mod
--enable-perl=no --ena
ble-relay=mod --enable-shell=mod --enable-overlays=mod
--enable-denyop=mod --enable-dyngroup=mod --enable-dynlist=mod
--enable-lastmod=mod --enable-proxycache=mod --enable-retcode=mod
--enable-rwm=mod --enable-dependency-tracking
(lots of modules are built but only hbd-backend is actually loaded when
I'm running)
These are the relevant and nonsensitive parts of the slapd.conf:
---
sizelimit 1000000
moduleload      back_hdb.la
database        hdb
suffix          *removed*
rootdn          *removed*
rootpw          *removed*
directory       /usr/local/lis/var/db
checkpoint      512 5
dirtyread
dbconfig set_cachesize 0 16777216 8
dbconfig set_lg_regionmax 262144
dbconfig set_lg_bsize 2097152
dbconfig set_lg_max 16777216
dbconfig set_flags DB_LOG_AUTOREMOVE
index objectClass eq
---
I use dirtyread because I want to be able to read while I'm writing
(which is almost always) while an occasional bad read is acceptable (it
should anyway be very rare since I don't do the modifications in the
"current" tree where I read).
Thanks in advance
Johan Jönemo