I work at a place with a fairly large openldap setup. (2.3.32). We have 3 large Read servers, Dell R900, 32 GB ram, 2x quadcore, hw RAID1 disks for ldap volume. The entire database takes about 13GB of physical disk space (the BDB files), and has a few million entries. DB_CONFIG is configurd to have the entire DB in memory (for speed), and slapd.conf cachesize is set to a million entries, to make as most effective use of the 32GB of ram this box has. We have them behind an F5 BigIP hardware load balancer (6400 series), and find that during peak times of the day, we get "connection deferred: binding" in our slapd.logs. (loglevel set to "none" (misnomer)), and a client request (or series of them) fails. If we use round robin DNS instead, we rarely see those errors. CPU usage is low, even during peak times, hovering at 20-50% of 1 core (the other 7 are idle)
The interesting this are it seems to happen (the connection defered: binding) , only after a certian load threshold is reached (busiest time of day), and only when behind the F5's. I am suspected it might be the "conn_max_pending" or "conn_max_pending_auth" defaults (100 and 1000 respectively), as when behind the F5, all the connections will appear to come from the F5 addresses, vs RR dns where it's coming from a wde range of sources (eah of the servers. (well over 100+).
We had tried experimenting with a higher number of threads previously, but that didn't seem to have a positive effect. Can any openLDAP guru's suggest some things to set/look for, i.e. (higher number of threads, higher defaults for conn_max_pending, conn_max_pending_auth).
Any ideas on what a theoretical performance limit should be of a machine of this caliber? i.e. how many reqs/sec, how far will it scale, etc..
We have plans to upgrade to 2.4, but it's a "down the road item", and mgmt is demanding answers to "how far can this design scale as it is"...
Thanks!
-- David J. Andruczyk