On Tue, Jul 21, 2009 at 01:54:25PM -0700, Quanah Gibson-Mount wrote:
--On Tuesday, July 21, 2009 4:51 PM -0400 "Clowser, Jeff" jeff_clowser@fanniemae.com wrote:
Do you have any facts/numbers to back this up? I've never seen F5's slow things down noticably.
We've had F5's be the root of the problem with several clients who load balanced their LDAP servers, and pointed postfix at the F5 for delivery. They added just a few milliseconds of time to each LDAP query, but that was enough to completely back up their mail delivery system. Removing the F5 from the picture allowed mail to flow smoothly, no more problems.
I can't speak for any other clients that Quanah may be referencing, but we experienced this with our Zimbra deployment. However, I emphatically disagree with his stance against running LDAP services behind a hardware load balancer.
We have F5 BigIPs in front of nearly every service we provide, for the reasons cited by others. In the past, we've had load balancers from Cisco (CSS), and Alteon (ACEdirector, IIRC, and now owned by Nortel) and our BigIPs have been the most transparent and have worked the best.
That said, we did encounter throughput problems with Zimbra's Postfix MTAs due to BigIP configuration. When incoming mail volume started to ramp up for the day, Postfix's queue size would slowly build. We ruled out (host) CPU consumption, disk I/O load, syslogging bottlenecks, and a host of other usual and unusual suspects on the hosts themselves.
I'm not sure if Quanah heard the final resolution, which was to change the LDAP VIP type from Standard to "Performance (Layer 4)." This solved the problem immediately. I didn't see the final response from F5, but my impression was that Performance (Layer 4) bypasses a lot of the hooks that let you manipulate packets and connections. Interestingly, CPU consumption on our BigIPs was low and therefore didn't prompt us to troubleshoot from that angle. This was the first we've seen this behavior; our non-Zimbra OpenLDAP nodes have a higher operation rate (~12k operations/sec aggregate) and had been servicing a similar mail infrastructure before we started moving to Zimbra's software.
On Tue, Jul 21, 2009 at 05:56:48AM -0700, David J. Andruczyk wrote:
We had tried experimenting with a higher number of threads previously, but that didn't seem to have a positive effect. Can any openLDAP guru's suggest some things to set/look for, i.e. (higher number of threads, higher defaults for conn_max_pending, conn_max_pending_auth).
Any ideas on what a theoretical performance limit should be of a machine of this caliber? i.e. how many reqs/sec, how far will it scale, etc..
Sounds like you're doing NAT on inbound connections (so connections offered to your LDAP nodes are sourced from the BigIP), so I'm not sure if this alternate VIP type would preclude doing that. If you have OneConnect enabled, you might try disabling that, too. I generally see it used with HTTP, but perhaps it's usable with other protocols?
AFAICT, increasing conn_max_pending_auth shouldn't be helpful unless your application(s) are doing a lot of asynchronous operations simultaneously (i.e., submit many LDAP operations at once and have them pending simultaneously). If they're primarily submitting an operation and waiting for a response, lather rinse repeat, I don't see how a connection could accumulate pending operations.
As far as scalability, I see no reason OpenLDAP shouldn't scale reasonably to the limits of your hardware (CPU consumption and disk I/O). It bodes well for your OpenLDAP build, tuning, etc. that it can handle your current workload when using round-robin DNS. What kind of LDAP ops/sec are these machines taking?
john