Dear list,
We are using an OpenLDAP/slapd server to manage the user accounts of our Samba server and have recently run into the problem that users cannot connect to Samba drives anymore after some time. Samba complains that it cannot connect to the LDAP server (see below for error message in Samba log) and the slapd log shows
Mar 25 11:38:15 office-server slapd[3433]: <= bdb_equality_candidates: (gidNumber) not indexed Mar 25 11:38:15 office-server slapd[3433]: <= bdb_equality_candidates: (gidNumber) not indexed Mar 25 11:38:15 office-server slapd[3433]: <= bdb_equality_candidates: (uid) not indexed Mar 25 11:38:15 office-server slapd[3433]: <= bdb_equality_candidates: (gidNumber) not indexed Mar 25 11:38:15 office-server slapd[3433]: <= bdb_equality_candidates: (sambaSID) not indexed Mar 25 11:38:15 office-server slapd[3433]: <= bdb_equality_candidates: (sambaSID) not indexed Mar 25 11:38:15 office-server slapd[3433]: bdb(dc=foo,dc=org): file id2entry.bdb has LSN 1/382892, past end of log at 1/283666 Mar 25 11:38:15 office-server slapd[3433]: bdb(dc=foo,dc=org): Commonly caused by moving a database from one database environment Mar 25 11:38:15 office-server slapd[3433]: bdb(dc=foo,dc=org): to another without clearing the database LSNs, or by removing all of Mar 25 11:38:15 office-server slapd[3433]: bdb(dc=foo,dc=org): the log files from a database environment Mar 25 11:38:17 office-server slapd[3433]: bdb(dc=foo,dc=org): DB_ENV->log_flush: LSN of 1/382892 past current end-of-log of 1/283666 Mar 25 11:38:17 office-server slapd[3433]: bdb(dc=foo,dc=org): Database environment corrupt; the wrong log files may have been removed or incompatible database files imported from another environment Mar 25 11:38:17 office-server slapd[3433]: bdb(dc=foo,dc=org): PANIC: DB_RUNRECOVERY: Fatal error, run database recovery Mar 25 11:38:17 office-server slapd[3433]: bdb(dc=foo,dc=org): id2entry.bdb: unable to flush page: 5 Mar 25 11:38:17 office-server slapd[3433]: bdb(dc=foo,dc=org): DB_ENV->log_flush: LSN of 1/378772 past current end-of-log of 1/283666 Mar 25 11:38:17 office-server slapd[3433]: bdb(dc=foo,dc=org): Database environment corrupt; the wrong log files may have been removed or incompatible database files imported from another environment Mar 25 11:38:17 office-server slapd[3433]: bdb(dc=foo,dc=org): PANIC: DB_RUNRECOVERY: Fatal error, run database recovery Mar 25 11:38:17 office-server slapd[3433]: bdb(dc=foo,dc=org): id2entry.bdb: unable to flush page: 7 Mar 25 11:38:17 office-server slapd[3433]: bdb(dc=foo,dc=org): DB_ENV->log_flush: LSN of 1/373647 past current end-of-log of 1/283666 Mar 25 11:38:17 office-server slapd[3433]: bdb(dc=foo,dc=org): Database environment corrupt; the wrong log files may have been removed or incompatible database files imported from another environment Mar 25 11:38:17 office-server slapd[3433]: bdb(dc=foo,dc=org): PANIC: DB_RUNRECOVERY: Fatal error, run database recovery Mar 25 11:38:17 office-server slapd[3433]: bdb(dc=foo,dc=org): id2entry.bdb: unable to flush page: 8 Mar 25 11:38:17 office-server slapd[3433]: bdb(dc=foo,dc=org): txn_checkpoint: failed to flush the buffer cache: DB_RUNRECOVERY: Fatal error, run database recovery Mar 25 11:38:51 office-server slapd[3433]: conn=62 op=29 do_search: invalid dn (sambaDomainName=,sambaDomainName=foo,dc=foo,dc=org) Mar 25 11:38:51 office-server slapd[3433]: bdb(dc=foo,dc=org): PANIC: fatal region error detected; run recovery Mar 25 11:39:01 office-server slapd[3433]: last message repeated 26 times Mar 25 11:39:01 office-server CRON[3657]: (root) CMD ( [ -x /usr/lib/php5/maxlifetime ] && [ -d /var/lib/php5 ] && find /var/lib/php5/ -type f -cmin +$(/usr/lib/php5/maxlifetime) -print0 | xargs -n 200 -r -0 rm) Mar 25 11:39:14 office-server slapd[3433]: bdb(dc=foo,dc=org): PANIC: fatal region error detected; run recovery Mar 25 11:39:47 office-server slapd[3433]: last message repeated 35 times Mar 25 11:39:48 office-server slapd[3433]: bdb(dc=foo,dc=org): PANIC: fatal region error detected; run recovery Mar 25 11:39:49 office-server slapd[3433]: bdb(dc=foo,dc=org): PANIC: fatal region error detected; run recovery Mar 25 11:39:50 office-server slapd[3433]: bdb(dc=foo,dc=org): PANIC: fatal region error detected; run recovery Mar 25 11:40:51 office-server slapd[3433]: last message repeated 164 times Mar 25 11:40:51 office-server slapd[3433]: last message repeated 3 times Mar 25 11:40:52 office-server slapd[3433]: bdb(dc=foo,dc=org): PANIC: fatal region error detected; run recovery Mar 25 11:41:53 office-server slapd[3433]: last message repeated 294 times
Strangely, restarting slapd helps and users can use Samba again for a limited and arbitrary period of time until the problem pops up again. I tried fixing the database using
db4.7_recover -v -h /var/lib/ldap
but again, the problem pops up again later.
I realized that when I shut down slapd using "/etc/init.d/slapd stop", it complains about the database being corrupt (even if so far no problems appeared):
Mar 25 10:12:35 office-server slapd[16880]: slapd shutdown: waiting for 0 operations/tasks to finish Mar 25 10:12:35 office-server slapd[16880]: bdb(dc=foo,dc=org): DB_ENV->log_flush: LSN of 1/382892 past current end-of-log of 1/278482 Mar 25 10:12:35 office-server slapd[16880]: bdb(dc=foo,dc=org): Database environment corrupt; the wrong log files may have been removed or incompatible database files imported from another environment Mar 25 10:12:35 office-server slapd[16880]: bdb(dc=foo,dc=org): PANIC: DB_RUNRECOVERY: Fatal error, run database recovery Mar 25 10:12:35 office-server slapd[16880]: bdb(dc=foo,dc=org): id2entry.bdb: unable to flush page: 5 Mar 25 10:12:35 office-server slapd[16880]: bdb(dc=foo,dc=org): DB_ENV->log_flush: LSN of 1/378772 past current end-of-log of 1/278482 Mar 25 10:12:35 office-server slapd[16880]: bdb(dc=foo,dc=org): Database environment corrupt; the wrong log files may have been removed or incompatible database files imported from another environment Mar 25 10:12:35 office-server slapd[16880]: bdb(dc=foo,dc=org): PANIC: DB_RUNRECOVERY: Fatal error, run database recovery Mar 25 10:12:35 office-server slapd[16880]: bdb(dc=foo,dc=org): id2entry.bdb: unable to flush page: 7 Mar 25 10:12:35 office-server slapd[16880]: bdb(dc=foo,dc=org): DB_ENV->log_flush: LSN of 1/373647 past current end-of-log of 1/278482 Mar 25 10:12:35 office-server slapd[16880]: bdb(dc=foo,dc=org): Database environment corrupt; the wrong log files may have been removed or incompatible database files imported from another environment Mar 25 10:12:35 office-server slapd[16880]: bdb(dc=foo,dc=org): PANIC: DB_RUNRECOVERY: Fatal error, run database recovery Mar 25 10:12:35 office-server slapd[16880]: bdb(dc=foo,dc=org): id2entry.bdb: unable to flush page: 8 Mar 25 10:12:35 office-server slapd[16880]: bdb(dc=foo,dc=org): PANIC: fatal region error detected; run recovery Mar 25 10:12:35 office-server slapd[16880]: bdb_db_close: database "dc=foo,dc=org": txn_checkpoint failed: DB_RUNRECOVERY: Fatal error, run database recovery (-30974). Mar 25 10:12:35 office-server slapd[16880]: bdb(dc=foo,dc=org): File handles still open at environment close Mar 25 10:12:35 office-server slapd[16880]: bdb(dc=foo,dc=org): Open file handle: /var/lib/ldap/log.0000000001 Mar 25 10:12:35 office-server slapd[16880]: bdb(dc=foo,dc=org): PANIC: fatal region error detected; run recovery Mar 25 10:12:35 office-server slapd[16880]: bdb_db_close: database "dc=foo,dc=org": close failed: DB_RUNRECOVERY: Fatal error, run database recovery (-30974) Mar 25 10:12:35 office-server slapd[16880]: slapd stopped. Mar 25 10:12:46 office-server slapd[19194]: @(#) $OpenLDAP: slapd 2.4.18 (Sep 8 2009 17:47:22) $#012#011buildd@crested:/build/buildd/openldap-2.4.18/debian/build/servers/slapd
Does anybody have an idea what the problem might be?
Many thanks for any hints or pointers! Kaspar