Hello!
I am trying to use openLDAP to hold a small but continuously rebuilt database with a hdb backend. Basically I build a directory under a temporary node and move it into place when its ready (hence the hdb, I want to move the hole tree into place in one go). I build a new directory, move out the current to an other temporary node and move in the new one. Lastly I delete the defunct tree and start over building a new tree. In short the idea is to always (except I suppose between moving the tree out and the new in, but I don't see any solution for that) have a complete tree in place while continuously trying to have it updated.
This thing works well for a couple of hours on the machine I am running it (PIII 1 cpu, 1000MHz, 512 Mb ram, linux 2.6 kernel), but then slows down by a factor 10-20.
Why is this and what can I do to stop it? (easy to ask...)
free shows: # free total used free shared buffers cached Mem: 508104 501992 6112 0 31268 332556 -/+ buffers/cache: 138168 369936 Swap: 2104504 2904 2101600
This isn't brilliant of course but AFAIU not catastrophic either. I have about the same when it isn't slowed down. vmstat with a sampling rate of a few seconds show no swapping before or after slapd slows down.
top shows that slapd and the script populating it runs at about 2-3% each and not much cpu consumption apart from that (consistent with a system that slows down a factor 20 I guess). The script uses Net::LDAP in perl (over local socket) so no external clients are invoked.
The really puzzling bit is that if a shut down slapd and the "directory builder" - thus reclaiming memory and filedescriptors and such to the system - and then restart them I almost immediately get the same slow down. In fact, the time it takes to get the computer to slow down after firing up slapd seems proportional to how long I let it "rest".
I have tried all sorts of things to analyze this and finally decided to profile slapd. I rebuilt it with -g -pg in CFLAGS and --enable-debug to configure (actually I have used that switch all along). I also discovered that I had to replace 'strip = -s' with 'strip =' in all makefiles even though --enable-debug was given (is this intentional or a bug in configure?). Finally I had to get the gprof-helper (and confirm that it was used) by Hocevar/Jönsson to be able to profile threaded applications. The result doesn't however tell me much. The slapd process seems to spend most (70-80%) of its time in the "at_next" routine.
Info about system: I am running v2.3.24 of slapd. built with: ./configure --program-prefix=jj4 --with-threads=yes --enable-dynamic --enable-debug --enable-crypt --enable-lmpasswd --enable-spasswd --enable- modules --enable-backends=mod --enable-sql=no --enable-ldap=mod --enable-meta=mod --enable-monitor=mod --enable-null=mod --enable-perl=no --ena ble-relay=mod --enable-shell=mod --enable-overlays=mod --enable-denyop=mod --enable-dyngroup=mod --enable-dynlist=mod --enable-lastmod=mod --enable-proxycache=mod --enable-retcode=mod --enable-rwm=mod --enable-dependency-tracking
(lots of modules are built but only hbd-backend is actually loaded when I'm running)
These are the relevant and nonsensitive parts of the slapd.conf:
--- sizelimit 1000000 moduleload back_hdb.la
database hdb
suffix *removed* rootdn *removed* rootpw *removed* directory /usr/local/lis/var/db checkpoint 512 5 dirtyread dbconfig set_cachesize 0 16777216 8 dbconfig set_lg_regionmax 262144 dbconfig set_lg_bsize 2097152 dbconfig set_lg_max 16777216 dbconfig set_flags DB_LOG_AUTOREMOVE index objectClass eq ---
I use dirtyread because I want to be able to read while I'm writing (which is almost always) while an occasional bad read is acceptable (it should anyway be very rare since I don't do the modifications in the "current" tree where I read).
Thanks in advance
Johan Jönemo
The information you provided still isn't specific enough, but I think you will need to use some lower level tools like oprofile to identify the problem. It could be your kernel, it could be glibc, it could be the malloc library, it could be the BerkeleyDB library, etc.
Johan Jönemo wrote:
Hello!
I am trying to use openLDAP to hold a small but continuously rebuilt database with a hdb backend. Basically I build a directory under a temporary node and move it into place when its ready (hence the hdb, I want to move the hole tree into place in one go). I build a new directory, move out the current to an other temporary node and move in the new one. Lastly I delete the defunct tree and start over building a new tree. In short the idea is to always (except I suppose between moving the tree out and the new in, but I don't see any solution for that) have a complete tree in place while continuously trying to have it updated.
This thing works well for a couple of hours on the machine I am running it (PIII 1 cpu, 1000MHz, 512 Mb ram, linux 2.6 kernel), but then slows down by a factor 10-20.
Why is this and what can I do to stop it? (easy to ask...)
free shows: # free total used free shared buffers cached Mem: 508104 501992 6112 0 31268 332556 -/+ buffers/cache: 138168 369936 Swap: 2104504 2904 2101600
This isn't brilliant of course but AFAIU not catastrophic either. I have about the same when it isn't slowed down. vmstat with a sampling rate of a few seconds show no swapping before or after slapd slows down.
top shows that slapd and the script populating it runs at about 2-3% each and not much cpu consumption apart from that (consistent with a system that slows down a factor 20 I guess). The script uses Net::LDAP in perl (over local socket) so no external clients are invoked.
The really puzzling bit is that if a shut down slapd and the "directory builder" - thus reclaiming memory and filedescriptors and such to the system - and then restart them I almost immediately get the same slow down. In fact, the time it takes to get the computer to slow down after firing up slapd seems proportional to how long I let it "rest".
I have tried all sorts of things to analyze this and finally decided to profile slapd. I rebuilt it with -g -pg in CFLAGS and --enable-debug to configure (actually I have used that switch all along). I also discovered that I had to replace 'strip = -s' with 'strip =' in all makefiles even though --enable-debug was given (is this intentional or a bug in configure?). Finally I had to get the gprof-helper (and confirm that it was used) by Hocevar/Jönsson to be able to profile threaded applications. The result doesn't however tell me much. The slapd process seems to spend most (70-80%) of its time in the "at_next" routine.
Info about system: I am running v2.3.24 of slapd. built with: ./configure --program-prefix=jj4 --with-threads=yes --enable-dynamic --enable-debug --enable-crypt --enable-lmpasswd --enable-spasswd --enable- modules --enable-backends=mod --enable-sql=no --enable-ldap=mod --enable-meta=mod --enable-monitor=mod --enable-null=mod --enable-perl=no --ena ble-relay=mod --enable-shell=mod --enable-overlays=mod --enable-denyop=mod --enable-dyngroup=mod --enable-dynlist=mod --enable-lastmod=mod --enable-proxycache=mod --enable-retcode=mod --enable-rwm=mod --enable-dependency-tracking
(lots of modules are built but only hbd-backend is actually loaded when I'm running)
These are the relevant and nonsensitive parts of the slapd.conf:
sizelimit 1000000 moduleload back_hdb.la
database hdb
suffix *removed* rootdn *removed* rootpw *removed* directory /usr/local/lis/var/db checkpoint 512 5 dirtyread dbconfig set_cachesize 0 16777216 8 dbconfig set_lg_regionmax 262144 dbconfig set_lg_bsize 2097152 dbconfig set_lg_max 16777216 dbconfig set_flags DB_LOG_AUTOREMOVE index objectClass eq
I use dirtyread because I want to be able to read while I'm writing (which is almost always) while an occasional bad read is acceptable (it should anyway be very rare since I don't do the modifications in the "current" tree where I read).
Thanks in advance
Johan Jönemo
openldap-software@openldap.org