On Mon, 4 Feb 2008, Howard Chu wrote:
You cannot copy the database files from one machine to another unless you're extremely careful and follow the procedures outlined in the BerkeleyDB documentation. It sounds like you didn't follow those procedures. Your problem isn't coincidental, it's inevitable when you don't RTFM.
Huh?
I can't seem to find it right now, but I distinctly recall reading documentation that indicated you could either start a syncrepl slave with no database, or to jumpstart the process and make it quicker you could copy the database from the master on to it before starting.
While no longer in the 2.4 documentation, the documentation from 2.3:
http://www.openldap.org/doc/admin23/replication.html#Configuring%20slurpd%20...
Discusses copying the database from the master to the slave in a slurpd context. Other than indicating to be sure that both systems are homogenous (same hardware, same OS, same versions; which in my case was completely true), there are no dire warnings or pointers to BerkeleyDB documentation procedures.
I took another quick look at the BerkeleyDB documentation on the Oracle site and did not see anything that seemed relevant to copying databases between machines. Could I trouble you for a URL to see whether there is anything in those procedures that might have been violated?
Also, even if for some reason the copies on the two slaves were invalid, that would not explain why the master failed. The database on the master was the original database built by slapadd when the server was first put into commission. How could making a copy of it have caused it to fail itself? Additionally taking into consideration that all three worked fine for almost a year under heavy load, it just doesn't seem likely that the failure of both the original master and both slaves was caused by an improper database copy.