On 8/12/15 7:29 AM, Andrew Findlay wrote:
On Mon, Aug 10, 2015 at 05:19:22PM -0700, Brian Wright wrote:
We're trying to solve the problem of how to recover/replace a failed node in a system containing a very large number of records and bring it back into the cluster as quickly as possible. We're also trying to resolve how to ensure that replication works consistently on restart.
In terms of recovering a failed node, the very fastest method is to use a database backup made with mdb_copy. The output from that command is a file that can be used directly as an MDB database so all you have to do is put it in place and restart slapd. Even if the backup is a day or two old, the replication process should bring in the more recent changes from another server.
[...]
There are some caveats with mdb_copy. In particular it can cause database bloat if run on a server that has a heavy write load at the time.
Andrew
Thanks for the tip. This really helps us a lot with recovering failed nodes. I wouldn't have thought to dig into the libraries/liblmdb area looking for tools. The library yes, but tools no. I had assumed there must be tools somewhere, but since they didn't get installed with the regular package, I didn't know where they were. I guess I should always be more investigative and look through all of the directories of the source. :)
As for the use of our environment, our LDAP traffic will mostly consist of reads with a much small numbers of writes throughout the day. So, our workload should probably not cause much bloat, if any, as long as we're judicious with the tool usage. Though, I will make note of this aspect when I write the docs for our use of the copy tool.
Thanks again.