Brett @Google wrote:
Hello,
I was wondering if anybody is using syncrepl in the context of a hardware content switch or redundant environment.
That is, having a "hidden" syncrepl master, and 1-N syncrepl clients which receive updates from the master, and those client nodes (and only those nodes) are visible via a content switch, for the purposes of load sharing and redundancy (more predominantly the latter).
I am considering the edge case where a connection is redirected to a client, and :
a) client has no current data (new node introduced) b) client decides it needs to do a full refresh - perhaps it was down and missed a large number of updates
The problem is, that while a replica is b) significantly incomplete or a) has no data at all, it should not be given any ldap requests by the content switch.
A standard content switch either blindly send connections round-robin to nodes 1-N, or if it determines that a server is "listening" (say by sending a SYN probe) before it sends through the ldap request. Few content switches are smart enough to examine the ldap failure code, as most just operate on tcp streams and don't do content inspection, so doing ldap content inspection is even less likely.
So this means that during the time a replica is initializing, and ldap requests are going to incorrectly get "no results" where the answer should be "not applicable" and the content switch or ldap client should have tried again, getting another (already initialized) server.
Ideally (in a content switch environment at least), the ldap server should not listen for requests while it is re-synchronising,
We've discussed adding this feature several times in the past. One of the reasons for not doing it implicitly is that a single slapd process may be responsible for many different namingContexts, and each of them may have wildly different master/slave statuses. The option of not listening for requests until the consumer is caught up is only usable when the slapd config has only one database.
but in the case of syncrepl push replication, replication can happens over the same port as ldap client requests.
One answer would be if syncrepl could happen over it's own port, as then there could then be the option of not accepting (not listening?) or refusing connections on the client port, whilst syncrepl is (re)building on the syncrepl port.
That still requires pull-based replication, and the whole scenario you're worried about in the previuous paragraph is solely about push-based replication.
We talked about creating an LDAP "Turn" exop to reverse the direction of an LDAP session, but the current solution works and a Turn exop does nothing to help the cases where the current solution won't work.
Alternatively, there could be a "health" port, which only accepted a connection and maybe returned "OK" if the replica was "healthy", this port could be specified as a "probe" port on the content switch, to determine the health of a syncrepl client.
Again, only useful if you treat slapd as one database per slapd instance.
I was just wondering how other people are dealing with this issue and thier content switches.
Seems to me that such a switch really isn't useful here. Also, if you're running an LDAP service where the network fabric can actually sustain more traffic than your LDAP servers, you've done something very strange. Considering that a dual-socket quad-core server running OpenLDAP can saturate a gigabit ethernet, I don't see how you can load-balance beyond that. The content switch will become the bottleneck.
If you're bringing up a brand new replica, just use a separate (virtual, if necessary) network interface while it's bootstrapping, and don't enable the main interface until it's caught up.