yes, we have been measuring latency when under the F5 vs RR. When we switched to RR DNS is DID drop quite a bit from around 100ms to about 20 ms. We do NOT yet have the VIP set to Performance layer 4 however. It was at "standard". F5 has since suggested performance layer 4, but we have not implemented it yet, only due to the fact that the connection deferred: binding messages cause severe annoyances, and lots of CS calls from users of the system (auth failures, misc issues), that mgmt is wary of trying anything else until they have proof that whatever we do WILL DEFINITELY WORK beforehand. (yes cart before the horse, I know, but they sign the checks as well...)
When behind the F5 in the LDAP server logs all connections appear to come from the F5's IP, so, when pumping a hundred server's connections through that one Ip there are going to be many many binds/unbinds going on constanly, all coming from the same IP (the F5), so why doesn't it through "connection deferred: binding" constantly as the connection load is certainly very very high, it only throws them occasionnally (every few seconds), but it's enough to cause a major impact in terrms of failed queries. Are you saying hte F5 is dropping part of the session after binding on a port and retriying to bind? (i.e. tryingto reuse an already open port that hasn't been closed cleanly?,) can this be due to an idle timeout difference on slapd vs the F5? Where is the idle timeout defined on the F5 specific to the ldap virtual server/pool? (slapd.conf has it set relatively low 20 seconds)
-- David J. Andruczyk
----- Original Message ---- From: Philip Guenther guenther+ldapsoft@sendmail.com To: David J. Andruczyk djandruczyk@yahoo.com Cc: openldap-software@openldap.org Sent: Wednesday, July 22, 2009 12:54:53 AM Subject: Re: performance issue behind a a load balancer 2.3.32
On Tue, Jul 21, 2009 at 01:54:25PM -0700, Quanah Gibson-Mount wrote:
--On Tuesday, July 21, 2009 4:51 PM -0400 "Clowser, Jeff" jeff_clowser@fanniemae.com wrote:
Do you have any facts/numbers to back this up? I've never seen F5's slow things down noticably.
We've had F5's be the root of the problem with several clients who load balanced their LDAP servers, and pointed postfix at the F5 for delivery. They added just a few milliseconds of time to each LDAP query, but that was enough to completely back up their mail delivery system. <...>
Given the reported log message, this (latency) is very likely to be the cause of the problem. "connection deferred: binding" means that the server received a request on a connection that was in the middle of processing a bind. This means that the client sends a bind and then additional request(s) without waiting for the bind result. That's a violation by the client of the LDAP protocol specification, RFC 4511, section 4.2.1, paragraph 2:
After sending a BindRequest, clients MUST NOT send further LDAP PDUs until receiving the BindResponse. Similarly, servers SHOULD NOT process or respond to requests received while processing a BindRequest.
The log message is slapd saying "I'm obeying that SHOULD NOT for this connection, loser". It should be obvious now why the conn_max_pending* options have no effect.
Understanding _why_ clients are violating the spec by sending further requests while a bind is outstanding may help you understand how the F5 or the clients should be tuned (or beaten with sticks, etc).
You presumably don't notice this under normal circumstances or with RR DNS because the server completes the BIND before the next request is received. My understanding (perhaps suspect) is that the F5 will increase the 'bunching' of packets on individual connections (because the first packet after a pause will see a higher latency than the succeeding packets).
So, are you measuring latency through the F5? I would *strongly* suggest doing so *before* tuning the F5 in any way, such as by the VIP type mentioned by John Morrissey, so that you can wave that in front of management (and under the nose of the F5 saleman when negotiating your next support renewal...)
Philip Guenther