richton@nbcs.rutgers.edu wrote:
Full_Name: Aaron Richton Version: 2.3.38 OS: Solaris 9 URL: ftp://ftp.openldap.org/incoming/ Submission from: (NULL) (68.196.250.105)
Just noticed that my syslog files were growing faster than usual. Upon further inspection, two slaves have multiple hdb databases corrupt. Both slave{4,6} have been (and are) running slapd since September 4. All are running patched BDB 4.2.52 (same binaries I've been using throughout the whole 2.3 series). All DB_CONFIGs have DB_LOG_AUTOREMOVE set. Messages similar to below are spewing out every checkpoint interval, which is the root cause of my logs growing unusually. I'm inclined to just zap all the databases and start again (they're only slaves), but figured I'd post for tracking and to ask if there's anything that can be grabbed out of the running process before I do so. Curiously enough, base4 only corrupted on slave4, not slave6. Additionally, there are other databases hosted on each slave that appear unaffected.
Have you got backups from just before these occurrences? Can you see what the last valid transaction log files were before this? Or perhaps you can get some db_stat's off any other slaves that are still running OK? The idea is to see whether the current valid CSNs on an equivalent slave are anywhere near the numbers being logged here, e.g. 1/188113 or 1/8730339.
Have you actually run out of disk space on the partitions holding the logs? It's rather suspicious that two machines would act up at the same time unless some admin specifically disturbed the log files on those two systems at around that time.
The first indication of trouble:
Sep 24 09:43:36 slave4.rutgers.edu slapd[295]: [ID 446079 local4.debug] bdb(base1): DB_ENV->log_flush: LSN of 1/8730339 past current end-of-log of 1/188113 Sep 24 09:43:36 slave4.rutgers.edu slapd[295]: [ID 446079 local4.debug] bdb(base1): Database environment corrupt; the wrong log files may have been removed or incompatible database files imported from another environment Sep 24 09:43:36 slave4.rutgers.edu slapd[295]: [ID 446079 local4.debug] bdb(base1): entryCSN.bdb: unable to flush page: 0 Sep 24 09:43:36 slave4.rutgers.edu slapd[295]: [ID 446079 local4.debug] bdb(base1): txn_checkpoint: failed to flush the buffer cache Invalid argument Sep 24 09:43:36 slave4.rutgers.edu slapd[295]: [ID 446079 local4.debug] bdb(base2): DB_ENV->log_flush: LSN of 54/1636114 past current end-of-log of 4/2981780 Sep 24 09:43:36 slave4.rutgers.edu slapd[295]: [ID 446079 local4.debug] bdb(base2): Database environment corrupt; the wrong log files may have been removed or incompatible database files imported from another environment Sep 24 09:43:36 slave4.rutgers.edu slapd[295]: [ID 446079 local4.debug] bdb(base2): entryUUID.bdb: unable to flush page: 0 Sep 24 09:43:36 slave4.rutgers.edu slapd[295]: [ID 446079 local4.debug] bdb(base2): txn_checkpoint: failed to flush the buffer cache Invalid argument Sep 24 09:43:36 slave4.rutgers.edu slapd[295]: [ID 446079 local4.debug] bdb(base3): DB_ENV->log_flush: LSN of 1/600564 past current end-of-log of 1/662 Sep 24 09:43:36 slave4.rutgers.edu slapd[295]: [ID 446079 local4.debug] bdb(base3): Database environment corrupt; the wrong log files may have been removed or incompatible database files imported from another environment Sep 24 09:43:36 slave4.rutgers.edu slapd[295]: [ID 446079 local4.debug] bdb(base3): cn.bdb: unable to flush page: 0 Sep 24 09:43:36 slave4.rutgers.edu slapd[295]: [ID 446079 local4.debug] bdb(base3): txn_checkpoint: failed to flush the buffer cache Invalid argument Sep 24 09:43:36 slave4.rutgers.edu slapd[295]: [ID 446079 local4.debug] bdb(base4): DB_ENV->log_flush: LSN of 3/2765493 past current end-of-log of 1/539 Sep 24 09:43:36 slave4.rutgers.edu slapd[295]: [ID 446079 local4.debug] bdb(base4): Database environment corrupt; the wrong log files may have been removed or incompatible database files imported from another environment Sep 24 09:43:36 slave4.rutgers.edu slapd[295]: [ID 446079 local4.debug] bdb(base4): uid.bdb: unable to flush page: 0 Sep 24 09:43:36 slave4.rutgers.edu slapd[295]: [ID 446079 local4.debug] bdb(base4): txn_checkpoint: failed to flush the buffer cache Invalid argument Sep 24 09:44:49 slave6.rutgers.edu slapd[301]: [ID 446079 local4.debug] bdb(base1): DB_ENV->log_flush: LSN of 1/8730401 past current end-of-log of 1/188113 Sep 24 09:44:49 slave6.rutgers.edu slapd[301]: [ID 446079 local4.debug] bdb(base1): Database environment corrupt; the wrong log files may have been removed or incompatible database files imported from another environment Sep 24 09:44:49 slave6.rutgers.edu slapd[301]: [ID 446079 local4.debug] bdb(base1): entryCSN.bdb: unable to flush page: 0 Sep 24 09:44:49 slave6.rutgers.edu slapd[301]: [ID 446079 local4.debug] bdb(base1): txn_checkpoint: failed to flush the buffer cache Invalid argument Sep 24 09:44:49 slave6.rutgers.edu slapd[301]: [ID 446079 local4.debug] bdb(base2): DB_ENV->log_flush: LSN of 54/1634334 past current end-of-log of 4/1649467 Sep 24 09:44:49 slave6.rutgers.edu slapd[301]: [ID 446079 local4.debug] bdb(base2): Database environment corrupt; the wrong log files may have been removed or incompatible database files imported from another environment Sep 24 09:44:49 slave6.rutgers.edu slapd[301]: [ID 446079 local4.debug] bdb(base2): entryUUID.bdb: unable to flush page: 0 Sep 24 09:44:49 slave6.rutgers.edu slapd[301]: [ID 446079 local4.debug] bdb(base2): txn_checkpoint: failed to flush the buffer cache Invalid argument Sep 24 09:44:49 slave6.rutgers.edu slapd[301]: [ID 446079 local4.debug] bdb(base3): DB_ENV->log_flush: LSN of 1/600564 past current end-of-log of 1/538 Sep 24 09:44:49 slave6.rutgers.edu slapd[301]: [ID 446079 local4.debug] bdb(base3): Database environment corrupt; the wrong log files may have been removed or incompatible database files imported from another environment Sep 24 09:44:49 slave6.rutgers.edu slapd[301]: [ID 446079 local4.debug] bdb(base3): cn.bdb: unable to flush page: 0 Sep 24 09:44:49 slave6.rutgers.edu slapd[301]: [ID 446079 local4.debug] bdb(base3): txn_checkpoint: failed to flush the buffer cache Invalid argument