On 8/12/15 7:29 AM, Andrew Findlay wrote:
On Mon, Aug 10, 2015 at 05:19:22PM -0700, Brian Wright wrote:

We're trying to solve the problem of how to recover/replace a failed
node in a system containing a very large number of records and bring
it back into the cluster as quickly as possible. We're also trying
to resolve how to ensure that replication works consistently on
restart.
In terms of recovering a failed node, the very fastest method is to use
a database backup made with mdb_copy. The output from that command is
a file that can be used directly as an MDB database so all you have to do
is put it in place and restart slapd. Even if the backup is a day or two
old, the replication process should bring in the more recent changes
from another server.

[...]

There are some caveats with mdb_copy. In particular it can cause
database bloat if run on a server that has a heavy write load at the
time.

Andrew

Thanks for the tip. This really helps us a lot with recovering failed nodes. I wouldn't have thought to dig into the libraries/liblmdb area looking for tools. The library yes, but tools no. I had assumed there must be tools somewhere, but since they didn't get installed with the regular package, I didn't know where they were. I guess I should always be more investigative and look through all of the directories of the source. :)

As for the use of our environment, our LDAP traffic will mostly consist of reads with a much small numbers of writes throughout the day. So, our workload should probably not cause much bloat, if any, as long as we're judicious with the tool usage. Though, I will make note of this aspect when I write the docs for our use of the copy tool.

Thanks again.

--
Signature

Brian Wright
Sr. UNIX Systems Engineer
901 Mariners Island Blvd Suite 200
San Mateo, CA 94404 USA
Email  brianw@marketo.com
Phone +1.650.539.3530
www.marketo.com

Marketo Logo