This is a large production environment (several hundred servers, thousands of requests per minute) and the F5-LB is used to balance the load and take care if a node needs to be taken out of service for maint for any reason. With RR DNS if a server is slow (for whatever reason ,backups ,etc) the F5 notices that and adjusts the connections distribution as needed, RR DNS can't do that. As far as indexes, the environment has been performing extremelywell until recently after a few m=hundred thousand more users were added as well as signifiantly higher activity, at which point we began seeing issues when behind the loadbalancers during peak times of day. The LB vender says the issue is with with openldap, and those settings, conn_max_pending/conn_max_pending_auth were the only ones that seemed to stick out, though the documentation on those is rather ambiguous.
-- David J. Andruczyk
----- Original Message ---- From: Sean O'Malley omalleys@msu.edu To: David J. Andruczyk djandruczyk@yahoo.com Cc: openldap-software@openldap.org Sent: Tuesday, July 21, 2009 2:24:58 PM Subject: Re: performance issue behind a a load balancer 2.3.32
Why bother with the load balancer? I am curious, I am sure there is a reason, but it isn't making a lot of sense to me. You can either do round robin dns, or just pass out the 3 read server addy's to the clients for failover (and change the order for real poor mans load balancing.)
conn_max_pending is what I had to adjust to up the connections, but i suspect you may have indexing issues, returning too many responses, etc.
On Tue, 21 Jul 2009, David J. Andruczyk wrote:
I work at a place with a fairly large openldap setup. (2.3.32). We have 3 large Read servers, Dell R900, 32 GB ram, 2x quadcore, hw RAID1 disks for ldap volume. The entire database takes about 13GB of physical disk space (the BDB files), and has a few million entries. DB_CONFIG is configurd to have the entire DB in memory (for speed), and slapd.conf cachesize is set to a million entries, to make as most effective use of the 32GB of ram this box has. We have them behind an F5 BigIP hardware load balancer (6400 series), and find that during peak times of the day, we get "connection deferred: binding" in our slapd.logs. (loglevel set to "none" (misnomer)), and a client request (or series of them) fails. If we use round robin DNS instead, we rarely see those errors. CPU usage is low, even during peak times, hovering at 20-50% of 1 core (the other 7 are idle)
The interesting this are it seems to happen (the connection defered: binding) , only after a certian load threshold is reached (busiest time of day), and only when behind the F5's. I am suspected it might be the "conn_max_pending" or "conn_max_pending_auth" defaults (100 and 1000 respectively), as when behind the F5, all the connections will appear to come from the F5 addresses, vs RR dns where it's coming from a wde range of sources (eah of the servers. (well over 100+).
We had tried experimenting with a higher number of threads previously, but that didn't seem to have a positive effect. Can any openLDAP guru's suggest some things to set/look for, i.e. (higher number of threads, higher defaults for conn_max_pending, conn_max_pending_auth).
Any ideas on what a theoretical performance limit should be of a machine of this caliber? i.e. how many reqs/sec, how far will it scale, etc..
We have plans to upgrade to 2.4, but it's a "down the road item", and mgmt is demanding answers to "how far can this design scale as it is"...
Thanks!
-- David J. Andruczyk
-------------------------------------- Sean O'Malley, Information Technologist Michigan State University -------------------------------------