Hello, I'm running openldap 2.3.43 on CentOS 5.3 machine (1 Xeon @2.33GHz, 1.5GB RAM). My LDAP directory contains about 158000 entry (users) loaded from a massive ldif add. I'm using hdb database and I defined a couple of additional indexes to default configuration. Very often (not every time) I stop the service and restart I recive this message:
hdb_db_open: unclean shutdown detected; attempting recovery. hdb_db_open: Recovery skipped in read-only mode. Run manual recovery if errors are encountered.
After I recive this message I can't contact ldap server and I have to run slapd_db_recover (that takes long time) to recover db.
This is a test environment and at this time there is no activity on the server (no read/write operations).
This is DB_CONFIG file content (/var/lib/ldap/DB_CONFIG):
# $OpenLDAP: pkg/ldap/servers/slapd/DB_CONFIG,v 1.1.2.4 2007/12/18 11:51:46 ghenry Exp $ # Example DB_CONFIG file for use with slapd(8) BDB/HDB databases. # # See the Oracle Berkeley DB documentation # http://www.oracle.com/technology/documentation/berkeley-db/db/ref/env/db_config.html # for detail description of DB_CONFIG syntax and semantics. # # Hints can also be found in the OpenLDAP Software FAQ # http://www.openldap.org/faq/index.cgi?file=2 # in particular: # http://www.openldap.org/faq/index.cgi?file=1075
# Note: most DB_CONFIG settings will take effect only upon rebuilding # the DB environment.
# one 0.25 GB cache # set_cachesize 0 268435456 1 set_cachesize 0 629145600 1
# Data Directory #set_data_dir db
# Transaction Log settings set_lg_regionmax 262144 #EXP set_lg_bsize 2097152 #set_lg_dir logs
# Note: special DB_CONFIG flags are no longer needed for "quick" # slapadd(8) or slapindex(8) access (see their -q option).
I tried to reduce lg_buffer_size to force the server write log file more often to avoid possible "lost transaction" but this isn't working.
Any ideas about the cause of these db corruptios?
Thanks a lot, Gabriele
On Tue, 2009-10-06 at 10:48 +0200, Antonini Gabriele wrote:
I'm running openldap 2.3.43 on CentOS 5.3 machine (1 Xeon @2.33GHz, 1.5GB RAM).
I don't know about CentOS, but if it's anything like Redhat, the system provided init scripts are very hostile to processes that don't shut down fast enough.
The killproc() function will send a TERM, wait 100k microseconds, then send a KILL.
You're running a large process on a low-memory machine. I strongly suspect CentOS isn't giving slapd enough time to shut down properly.
Instead of using the system scripts, try sending a plain kill to slapd and timing how long it takes to shut down. If it's more than 100 seconds, you'll either need to add memory (I'd recommend that anyway) or stop using the CentOS init scripts to start and stop OpenLDAP.
On Tue, 6 Oct 2009, Brandon Hume wrote:
You're running a large process on a low-memory machine. I strongly suspect CentOS isn't giving slapd enough time to shut down properly.
I think you've hit this head on. The only other suggestion I might make is to consider the checkpointing configuration as described in slapd-bdb(5) man page. As a rule of thumb, more frequent checkpoints mean that the time to flush at shutdown is decreased.
-----Original Message----- From: Aaron Richton [mailto:richton@nbcs.rutgers.edu] Sent: martedì 6 ottobre 2009 15.35 To: Brandon Hume; Antonini Gabriele Cc: openldap-software@openldap.org Subject: Re: openldap service stop cause database corruption
I think you've hit this head on. The only other suggestion I might make is to consider the checkpointing configuration as described in slapd-bdb(5) man page. As a rule of thumb, more frequent checkpoints mean that the time to flush at shutdown is decreased.
Thanks for the suggestion. I'm populating ldap directory from scratch, then I'll change stop script adding 30s sleep time and I'll add more frequent checkpoints (every 5 minute?). I'll let you know if this fix the problem.
Bye, G.
2009/10/6 Aaron Richton richton@nbcs.rutgers.edu:
On Tue, 6 Oct 2009, Brandon Hume wrote:
You're running a large process on a low-memory machine. I strongly suspect CentOS isn't giving slapd enough time to shut down properly.
I think you've hit this head on. The only other suggestion I might make is to consider the checkpointing configuration as described in slapd-bdb(5) man page. As a rule of thumb, more frequent checkpoints mean that the time to flush at shutdown is decreased.
Hi,
for your information, there are init script and packages for CentOS provided by LTB-project: - http://ltb-project.org/wiki/documentation/openldap-initscript - http://ltb-project.org/wiki/documentation/openldap-rpm
Hope this helps.
Clément.
I don't know about CentOS, but if it's anything like Redhat, the system provided init scripts are very hostile to processes that don't shut down fast enough.
The killproc() function will send a TERM, wait 100k microseconds, then send a KILL.
You're running a large process on a low-memory machine. I strongly suspect CentOS isn't giving slapd enough time to shut down properly.
Instead of using the system scripts, try sending a plain kill to slapd and timing how long it takes to shut down. If it's more than 100 seconds, you'll either need to add memory (I'd recommend that anyway) or stop using the CentOS init scripts to start and stop OpenLDAP.
I tried modifying stop script adding 30 seconds sleep after killproc but the problem remains. Here is the stop script:
function stop() { # Stop daemons. prog=`basename ${slapd}` echo -n $"Stopping $prog: " killproc ${slapd} sleep 30 RETVAL=$? echo if [ $RETVAL -eq 0 ]; then if grep -q "^replogfile" /etc/openldap/slapd.conf; then prog=`basename ${slurpd}` echo -n $"Stopping $prog: " killproc ${slurpd} sleep 30 RETVAL=$? echo fi fi [ $RETVAL -eq 0 ] && rm -f /var/lock/subsys/ldap /var/run/slapd.args return $RETVAL }
On Thu, 2009-10-08 at 15:07 +0200, Antonini Gabriele wrote:
I tried modifying stop script adding 30 seconds sleep after killproc but the problem remains. Here is the stop script:
That wouldn't accomplish anything... the damage occurs INSIDE killproc.
Did you time how long it takes your slapd to shut down when not using the init scripts? I've seen a large slapd with broken checkpointing in a low-memory environment like yours take upwards of ten minutes to exit.
The scripts that Clément posted links to seem vastly superior to what I've seen come stock with Redhat (and perhaps your version of CentOS). You might want to think about using those instead.
-----Original Message----- From: openldap-software-bounces+g.antonini=giuntilabs.com@OpenLDAP.org [mailto:openldap-software- bounces+g.antonini=giuntilabs.com@OpenLDAP.org] On Behalf Of Brandon Hume Sent: giovedì 8 ottobre 2009 15.50 To: openldap-software@openldap.org
Hi Brandon, Thanks for the reply.
Did you time how long it takes your slapd to shut down when not using the init scripts? I've seen a large slapd with broken checkpointing in a low-memory environment like yours take upwards of ten minutes to exit.
I tried to stop slapd without init script and it takes few seconds (2-3) to shut down. Another question about memory: since I'm in a virtual environment I can add RAM up to 64GB What is, in your opinion a decent memory amount for an LDAP server? The system doesn't seems to be under stress and there is a low of free memory.
The scripts that Clément posted links to seem vastly superior to what I've seen come stock with Redhat (and perhaps your version of CentOS). You might want to think about using those instead.
I tried the script Clément posted without significant differences. I'll try again today modifying configurations. I'll post here updates.
Thanks again, G.
Brandon Hume wrote:
On Thu, 2009-10-08 at 15:07 +0200, Antonini Gabriele wrote:
I tried modifying stop script adding 30 seconds sleep after killproc but the problem remains. Here is the stop script:
That wouldn't accomplish anything... the damage occurs INSIDE killproc.
Did you time how long it takes your slapd to shut down when not using the init scripts? I've seen a large slapd with broken checkpointing in a low-memory environment like yours take upwards of ten minutes to exit.
The scripts that Clément posted links to seem vastly superior to what I've seen come stock with Redhat (and perhaps your version of CentOS). You might want to think about using those instead.
FYI:
The behaviour has allegedly been fixed in Fedora, and a bug report was filed with RHEL today:
https://bugzilla.redhat.com/show_bug.cgi?id=528124
On Thursday, 8 October 2009 14:07:34 Antonini Gabriele wrote:
I don't know about CentOS, but if it's anything like Redhat, the system provided init scripts are very hostile to processes that don't shut down fast enough.
The killproc() function will send a TERM, wait 100k microseconds, then send a KILL.
You're running a large process on a low-memory machine. I strongly suspect CentOS isn't giving slapd enough time to shut down properly.
Instead of using the system scripts, try sending a plain kill to slapd and timing how long it takes to shut down. If it's more than 100 seconds, you'll either need to add memory (I'd recommend that anyway) or stop using the CentOS init scripts to start and stop OpenLDAP.
I tried modifying stop script adding 30 seconds sleep after killproc but the problem remains. Here is the stop script:
function stop() { # Stop daemons. prog=`basename ${slapd}` echo -n $"Stopping $prog: " killproc ${slapd}
Make this:
killproc -d 30 ${slapd}
(and remove the line below)
sleep 30 RETVAL=$? echo if [ $RETVAL -eq 0 ]; then if grep -q "^replogfile" /etc/openldap/slapd.conf; then prog=`basename ${slurpd}` echo -n $"Stopping $prog: " killproc ${slurpd} sleep 30 RETVAL=$? echo fi fi [ $RETVAL -eq 0 ] && rm -f /var/lock/subsys/ldap
/var/run/slapd.args return $RETVAL }
openldap-software@openldap.org