On Thu, 25 Sep 2008, Howard Chu wrote:
Brett @Google wrote:
I was wondering if anybody is using syncrepl in the context of a hardware content switch or redundant environment.
Yes.
I am considering the edge case where a connection is redirected to a client, and :
a) client has no current data (new node introduced) b) client decides it needs to do a full refresh - perhaps it was down and missed a large number of updates
Yes, you need to keep all servers identical (as much as practical).
Seems to me that such a switch really isn't useful here. Also, if you're running an LDAP service where the network fabric can actually sustain more traffic than your LDAP servers, you've done something very strange. Considering that a dual-socket quad-core server running OpenLDAP can saturate a gigabit ethernet, I don't see how you can load-balance beyond that. The content switch will become the bottleneck.
It's not so much about saturating the wire (although our current switches do 2Gbps each, and I'm sure the next ones will be on the order of 6-8Gbps each, and we use more than one). It's about service availability -- taking down a slave and having everything else converge onto the remaining slaves in well under a second. A load balancer handles this much faster than the vast majority of clients configured with multiple servers, and there's no client delays as they vainly attempt down servers. You also don't have to worry about any software that only allows you to configure a single server.
If you're bringing up a brand new replica, just use a separate (virtual, if necessary) network interface while it's bootstrapping, and don't enable the main interface until it's caught up.
This is essentially what we do. We start with slapadd -q from recent LDIF. Then, to catch "late breaking changes," we slapd -h ldapi:///. During both of these procedures, there's nothing listening on the network, so the load balancer marks the node as failed. Once contextCSNs appear in sync (discussed at length in the archives), restart slapd with listeners.
Strictly speaking, you could consider one of the contextCSN checks as a custom load balancer check. This might be a bit dangerous, though, since syncrepl only guarantees eventual convergence. It's theoretically possible that all your slaves would fail out during a particularly large refresh. You'll have to decide for yourself if it's more dangerous to be serving stale data or to be serving no data. We don't do this, because we'd rather be serving stale.