On Mon, 10 Aug 2015, Brian Wright wrote:
this amount of transfer load? Are we expected to recover the node via a separate method (i.e., slapcat / slapadd) and then kick replication off only after it's been loaded?
[...]
We're trying to solve the problem of how to recover/replace a failed node in a system containing a very large number of records and bring it back into the cluster as quickly as possible. We're also trying to resolve how to ensure that replication works consistently on restart.
"Expected" might be too strong; there's more than one way to do it. But by definition, you're going to have slapd(8) backed (hopefully) with some flavor of transactional integrity, and that represents a extremely significant cost in your data store writes. You'll also have various syntax/schema validation, etc. occurring.
So if you bring up your initial load with slapadd(8), safely taking advantage of -q and similar options (see the man page), you'll get the bulk load completed without this overhead. Even if your input LDIF is somewhat "stale" syncrepl should be able to figure out the last delta within a reasonable time.
Regardless of method, you can use the standard CSN monitoring techniques (discussed extensively on this list) to "ensure that replication works."