syncrepl with hardware content switch

List overview All Threads
Download

newer

older

Securing cn=config

size of DB files not decreasing

Brett ＠Google

25 Sep 2008 25 Sep '08

6:54 p.m.

Hello,

I was wondering if anybody is using syncrepl in the context of a hardware content switch or redundant environment.

That is, having a "hidden" syncrepl master, and 1-N syncrepl clients which receive updates from the master, and those client nodes (and only those nodes) are visible via a content switch, for the purposes of load sharing and redundancy (more predominantly the latter).

I am considering the edge case where a connection is redirected to a client, and :

a) client has no current data (new node introduced) b) client decides it needs to do a full refresh - perhaps it was down and missed a large number of updates

The problem is, that while a replica is b) significantly incomplete or a) has no data at all, it should not be given any ldap requests by the content switch.

A standard content switch either blindly send connections round-robin to nodes 1-N, or if it determines that a server is "listening" (say by sending a SYN probe) before it sends through the ldap request. Few content switches are smart enough to examine the ldap failure code, as most just operate on tcp streams and don't do content inspection, so doing ldap content inspection is even less likely.

So this means that during the time a replica is initializing, and ldap requests are going to incorrectly get "no results" where the answer should be "not applicable" and the content switch or ldap client should have tried again, getting another (already initialized) server.

Ideally (in a content switch environment at least), the ldap server should not listen for requests while it is re-synchronising, but in the case of syncrepl push replication, replication can happens over the same port as ldap client requests.

One answer would be if syncrepl could happen over it's own port, as then there could then be the option of not accepting (not listening?) or refusing connections on the client port, whilst syncrepl is (re)building on the syncrepl port.

Alternatively, there could be a "health" port, which only accepted a connection and maybe returned "OK" if the replica was "healthy", this port could be specified as a "probe" port on the content switch, to determine the health of a syncrepl client.

I was just wondering how other people are dealing with this issue and thier content switches.

Cheers Brett

Show replies by date

Howard Chu

25 Sep 25 Sep

9:41 p.m.

Brett @Google wrote:

...

Hello,

I was wondering if anybody is using syncrepl in the context of a hardware content switch or redundant environment.

That is, having a "hidden" syncrepl master, and 1-N syncrepl clients which receive updates from the master, and those client nodes (and only those nodes) are visible via a content switch, for the purposes of load sharing and redundancy (more predominantly the latter).

I am considering the edge case where a connection is redirected to a client, and :

a) client has no current data (new node introduced) b) client decides it needs to do a full refresh - perhaps it was down and missed a large number of updates

The problem is, that while a replica is b) significantly incomplete or a) has no data at all, it should not be given any ldap requests by the content switch.

A standard content switch either blindly send connections round-robin to nodes 1-N, or if it determines that a server is "listening" (say by sending a SYN probe) before it sends through the ldap request. Few content switches are smart enough to examine the ldap failure code, as most just operate on tcp streams and don't do content inspection, so doing ldap content inspection is even less likely.

So this means that during the time a replica is initializing, and ldap requests are going to incorrectly get "no results" where the answer should be "not applicable" and the content switch or ldap client should have tried again, getting another (already initialized) server.

Ideally (in a content switch environment at least), the ldap server should not listen for requests while it is re-synchronising,

We've discussed adding this feature several times in the past. One of the reasons for not doing it implicitly is that a single slapd process may be responsible for many different namingContexts, and each of them may have wildly different master/slave statuses. The option of not listening for requests until the consumer is caught up is only usable when the slapd config has only one database.

...

but in the case of syncrepl push replication, replication can happens over the same port as ldap client requests.

One answer would be if syncrepl could happen over it's own port, as then there could then be the option of not accepting (not listening?) or refusing connections on the client port, whilst syncrepl is (re)building on the syncrepl port.

That still requires pull-based replication, and the whole scenario you're worried about in the previuous paragraph is solely about push-based replication.

We talked about creating an LDAP "Turn" exop to reverse the direction of an LDAP session, but the current solution works and a Turn exop does nothing to help the cases where the current solution won't work.

...

Alternatively, there could be a "health" port, which only accepted a connection and maybe returned "OK" if the replica was "healthy", this port could be specified as a "probe" port on the content switch, to determine the health of a syncrepl client.

Again, only useful if you treat slapd as one database per slapd instance.

...

I was just wondering how other people are dealing with this issue and thier content switches.

Seems to me that such a switch really isn't useful here. Also, if you're running an LDAP service where the network fabric can actually sustain more traffic than your LDAP servers, you've done something very strange. Considering that a dual-socket quad-core server running OpenLDAP can saturate a gigabit ethernet, I don't see how you can load-balance beyond that. The content switch will become the bottleneck.

If you're bringing up a brand new replica, just use a separate (virtual, if necessary) network interface while it's bootstrapping, and don't enable the main interface until it's caught up.

-- -- Howard Chu CTO, Symas Corp. http://www.symas.com Director, Highland Sun http://highlandsun.com/hyc/ Chief Architect, OpenLDAP http://www.openldap.org/project/

Aaron Richton

26 Sep 26 Sep

7:23 a.m.

On Thu, 25 Sep 2008, Howard Chu wrote:

...

Brett @Google wrote:

...
I was wondering if anybody is using syncrepl in the context of a hardware content switch or redundant environment.

Yes.

...

...
I am considering the edge case where a connection is redirected to a client, and :

a) client has no current data (new node introduced) b) client decides it needs to do a full refresh - perhaps it was down and missed a large number of updates

Yes, you need to keep all servers identical (as much as practical).

...

Seems to me that such a switch really isn't useful here. Also, if you're running an LDAP service where the network fabric can actually sustain more traffic than your LDAP servers, you've done something very strange. Considering that a dual-socket quad-core server running OpenLDAP can saturate a gigabit ethernet, I don't see how you can load-balance beyond that. The content switch will become the bottleneck.

It's not so much about saturating the wire (although our current switches do 2Gbps each, and I'm sure the next ones will be on the order of 6-8Gbps each, and we use more than one). It's about service availability -- taking down a slave and having everything else converge onto the remaining slaves in well under a second. A load balancer handles this much faster than the vast majority of clients configured with multiple servers, and there's no client delays as they vainly attempt down servers. You also don't have to worry about any software that only allows you to configure a single server.

...

If you're bringing up a brand new replica, just use a separate (virtual, if necessary) network interface while it's bootstrapping, and don't enable the main interface until it's caught up.

This is essentially what we do. We start with slapadd -q from recent LDIF. Then, to catch "late breaking changes," we slapd -h ldapi:///. During both of these procedures, there's nothing listening on the network, so the load balancer marks the node as failed. Once contextCSNs appear in sync (discussed at length in the archives), restart slapd with listeners.

Strictly speaking, you could consider one of the contextCSN checks as a custom load balancer check. This might be a bit dangerous, though, since syncrepl only guarantees eventual convergence. It's theoretically possible that all your slaves would fail out during a particularly large refresh. You'll have to decide for yourself if it's more dangerous to be serving stale data or to be serving no data. We don't do this, because we'd rather be serving stale.

Pierangelo Masarati

27 Sep 27 Sep

8:32 a.m.

Aaron Richton wrote:

...

availability -- taking down a slave and having everything else converge onto the remaining slaves in well under a second. A load balancer handles this much faster than the vast majority of clients configured with multiple servers, and there's no client delays as they vainly attempt down servers.

In OpenLDAP 2.4, the client library allows clients to register a call that "shuffles" URIs configured via ldap_initialize(). This is used in the proxy backends to take note of a failing server, moving it to the end of the URI list. A "smart" client that repeatedly needs to establish connections to a list of URIs could do the same in order to avoid having to run thru the URI list until one responds. Dumb clients could exploit this by actually contacting a back-ldap/meta configured this way. Of course, this would move the single point of failure to the proxy...

Ing. Pierangelo Masarati OpenLDAP Core Team

SysNet s.r.l. via Dossi, 8 - 27100 Pavia - ITALIA http://www.sys-net.it ----------------------------------- Office: +39 02 23998309 Mobile: +39 333 4963172 Fax: +39 0382 476497 Email: ando@sys-net.it -----------------------------------

Brett ＠Google

8:48 a.m.

On Thu, 25 Sep 2008, Howard Chu wrote:

...

Seems to me that such a switch really isn't useful here. Also, if you're running an LDAP service where the network fabric can actually sustain more traffic than your LDAP servers, you've done something very strange. Considering that a dual-socket quad-core server running OpenLDAP can saturate a gigabit ethernet, I don't see how you can load-balance beyond that. The content switch will become the bottleneck.

As per what Aaron said, it's more about availability than performance, which is good (indexes permitting).

We have a dual site / redundant setup, where two content switches (one at each site) replicate config between themselves. Each site has it's own distinct cluster of worker nodes, the content switch(es) own a virtual ip where work is spread across the worker nodes.

Presently we replicating via the old slurp method, but whilst upgrading to 2.4 we will need to move to syncrepl replication. On this there is probably a case for a slurp to syncrepl migration guide. The section i've seen on syncrepl is ok, but it switches between using .conf and cn=config in parts which makes them hard to compare.

Probably it should use one or the other config methods for clarity, or show examples of both IMHO.

Silly question, but on that vein, anybody know where the HEAD openldap guide source is located ?

Thanks Brett

Jason W.

1:34 p.m.

On Thu, Sep 25, 2008 at 9:54 PM, Brett @Google brett.maxfield@gmail.com wrote:

...

I was just wondering how other people are dealing with this issue and thier content switches.

At $workplace, we use a Cisco CSM. It allows the network guy to administratively take a node (RIP) out of service even if the CSM would normally send traffic to it. This allows me to either bring up a new node and let it sync or to debug a problem with a specific node while slapd is still running and accepting connections.

Whenever I need to do maintainence on one of the nodes, the network guy is able to take the RIP out of service and clear the connections to it so that all clients will immediately reconnect to the VIP and land on one of the other nodes.

So far, this approach works for us. We've been using it since we deployed OpenLDAP about 2 years ago.

-- HTH, YMMV, HANW :) Jason The path to enlightenment is /usr/bin/enlightenment.

Buchan Milne

30 Sep 30 Sep

1:53 a.m.

On Friday 26 September 2008 03:54:10 Brett @Google wrote:

...

Hello,

I was wondering if anybody is using syncrepl in the context of a hardware content switch or redundant environment.

That is, having a "hidden" syncrepl master, and 1-N syncrepl clients which receive updates from the master, and those client nodes (and only those nodes) are visible via a content switch, for the purposes of load sharing and redundancy (more predominantly the latter).

I am considering the edge case where a connection is redirected to a client, and :

a) client has no current data (new node introduced) b) client decides it needs to do a full refresh - perhaps it was down and missed a large number of updates

Both these cases can exist to some degree with slurpd, and in the end, the requirements (IMHO) are the same. Don't keep an out-of-sync slave in service. However, with syncrepl, at least you can much more easily monitor "in- syncness".

...

The problem is, that while a replica is b) significantly incomplete or a) has no data at all, it should not be given any ldap requests by the content switch.

A standard content switch either blindly send connections round-robin to nodes 1-N, or if it determines that a server is "listening" (say by sending a SYN probe) before it sends through the ldap request. Few content switches are smart enough to examine the ldap failure code, as most just operate on tcp streams and don't do content inspection, so doing ldap content inspection is even less likely.

I don't see how the LDAP result code would help in any case, as there is no result code for "Not here, but should be".

...

So this means that during the time a replica is initializing, and ldap requests are going to incorrectly get "no results" where the answer should be "not applicable" and the content switch or ldap client should have tried again, getting another (already initialized) server.

Ideally (in a content switch environment at least), the ldap server should not listen for requests while it is re-synchronising,

An option to start the slapd and only have it answer requests once it is in sync has been discussed before ...

...

but in the case of syncrepl push replication, replication can happens over the same port as ldap client requests.

One answer would be if syncrepl could happen over it's own port, as then there could then be the option of not accepting (not listening?) or refusing connections on the client port, whilst syncrepl is (re)building on the syncrepl port.

Alternatively, there could be a "health" port, which only accepted a connection and maybe returned "OK" if the replica was "healthy", this port could be specified as a "probe" port on the content switch, to determine the health of a syncrepl client.

I was just wondering how other people are dealing with this issue and thier content switches.

We monitor the replication status of our slaves with network monitoring software, which alarms if the slave is more than an hour out of sync. If a slave is out of sync for more than an hour, and doesn't recover, we take it out of service.

However, we do see circumstances (where some application pushes 50 000 deletes) where slaves (syncrepl, not delta-syncrepl) take more than an hour to catch up. If the load balancer were to take servers out of service automatically based on replication state, that would have been an unnecessary outage.

In my opinion, leave application monitoring to application/network monitoring software, and only have the load balancer do basic "is this service usable" monitoring (IOW, at most, do I see the right banner on SMTP/POP3/IMAP). Ensure your processes are able to connect those two dots.

I have also seen outages caused by complex probes (e.g. which do pop3 authentication) and removal/suspension of the account that was used in the probe.

Regards, Buchan

6129

Age (days ago)

6133

Last active (days ago)

openldap-software@openldap.org

6 comments

6 participants

tags (0)

participants (6)

Aaron Richton
Brett ＠Google
Buchan Milne
Howard Chu
Jason W.
Pierangelo Masarati