All LDAP traffic currently is using RR DNS.
The network is essentially "flat", the LDAP servers and systems requiring LDAP
are on the same subnetwork, hence why when using the F5's for LDAP balancing all
traffic will appears to come from the F5, otherwise you'll have an async routing
issue. The F5 has VIP's on both the "inside" and the outside. (outside
adddresses are in the DMZ behind the perimeter firewalls, and are for balancing traffic to
other server clusters, i.e. web, etc)
Mgmt is of the mindset, of "if it works (even if it doesn't provide proper
redundancy right now), then leave it be", which is OK, if servers never ever crash.
I'm of the opinion of finding out WHY the ldap servers log "connection deferred:
binding" when behind the F5's and ONLY when past a certain arbritrary load
threshold. (i.e. for an hour or two around the busiest time of day, it throws those
warnings every few seconds/minutes, but below that point all is well). hence my focus on
conn_max_pending, and conn_max_pending_auth. thought I haven't heard a concrete
response yet, saying that, "Yes,in your case where al lthe traffic will appear to
come from the F5, due to the network layout, those parameters are too low and likely to
throttle connections at some arbritrary level.".
I think the first test will be to try performance layer 4 on the F5, and if there still
happens to be an issue, to try dobling the values of conn_max_pending, and
-- David J. Andruczyk
----- Original Message ----
From: John Morrissey <jwm(a)horde.net>
To: David J. Andruczyk <djandruczyk(a)yahoo.com>
Cc: openldap-software(a)openldap.org; Philip Guenther
Sent: Wednesday, July 29, 2009 1:20:44 PM
Subject: Re: performance issue behind a a load balancer 2.3.32
On Wed, Jul 22, 2009 at 05:37:30AM -0700, David J. Andruczyk wrote:
yes, we have been measuring latency when under the F5 vs RR. When we
switched to RR DNS is DID drop quite a bit from around 100ms to about 20
FWIW and IIRC, after switching to Performance (Layer 4), the observed
latency for LDAP operations to the VIP and to the nodes themselves was
essentially the same. I can't say what the latency difference was, since I
wasn't the one who was troubleshooting the BigIPs and don't have the numbers
We do NOT yet have the VIP set to Performance layer 4 however. It
"standard". F5 has since suggested performance layer 4, but we have not
implemented it yet, only due to the fact that the connection deferred:
binding messages cause severe annoyances, and lots of CS calls from users
of the system (auth failures, misc issues), that mgmt is wary of trying
anything else until they have proof that whatever we do WILL DEFINITELY
WORK beforehand. (yes cart before the horse, I know, but they sign the
checks as well...)
That seems short-sighted, unless you're implying that you've moved all LDAP
traffic off your BigIPs until you have a solution in hand that you *know*
will solve the problem.
They may sign the checks, but that doesn't mean that informed argument
shouldn't carry weight.
When behind the F5 in the LDAP server logs all connections appear to
from the F5's IP, so, when pumping a hundred server's connections through
that one Ip there are going to be many many binds/unbinds going on
constanly, all coming from the same IP (the F5), so why doesn't it through
"connection deferred: binding" constantly as the connection load is
certainly very very high, it only throws them occasionnally (every few
seconds), but it's enough to cause a major impact in terrms of failed
queries. Are you saying hte F5 is dropping part of the session after
binding on a port and retriying to bind?
+1 on what Philip mentioned:
On Tue, 21 Jul 2009 21:54:53 -0700, Philip Guenther wrote:
Given the reported log message, this (latency) is very likely to be
cause of the problem. "connection deferred: binding" means that the
server received a request on a connection that was in the middle of
processing a bind. This means that the client sends a bind and then
additional request(s) without waiting for the bind result. That's a
violation by the client of the LDAP protocol specification, RFC 4511,
section 4.2.1, paragraph 2:
Understanding _why_ clients are violating the spec by sending further
requests while a bind is outstanding may help you understand how the F5 or
the clients should be tuned (or beaten with sticks, etc).
You presumably don't notice this under normal circumstances or with RR DNS
because the server completes the BIND before the next request is received.
My understanding (perhaps suspect) is that the F5 will increase the
'bunching' of packets on individual connections (because the first packet
after a pause will see a higher latency than the succeeding packets).
So, are you measuring latency through the F5? I would *strongly* suggest
doing so *before* tuning the F5 in any way, such as by the VIP type
mentioned by John Morrissey, so that you can wave that in front of
management (and under the nose of the F5 saleman when negotiating your
next support renewal...)
What I'm parsing from:
(only accessible with an F5 support contract, unfortunately), is that with
the "Standard" VIP type, the BigIP will wait for a three-way TCP handshake
before establishing a connection with the load-balanced node. The BigIP
becomes a "man in the middle" and establishes two independent connections:
one facing the client, another facing the load balanced node.
With "Performance (Layer 4)", the BigIP forwards packets between clients and
load-balanced nodes as they're received. As Philip says, the packet
"bunching" due to the MITM nature of the Standard VIP type is probably
teaming up with your LDAP client misbehavior. Under heavy load, the
likelihood of bunching increases and you "win" this race condition.
Out of curiosity, what LDAP client SDK is involved here?
John Morrissey _o /\ ---- __o
jwm(a)horde.net _-< \_ / \ ---- < \,
__(_)/_(_)________/ \_______(_) /_(_)__