Heya,

I've got to assume that it's another manifestation of the artificial bottleneck that was being introduced due to the bandwidth limitation.

Clients were connected and just sat there twiddling their digital thumbs awaiting some bandwidth with which to return results.

Still trying to establish a baseline as to what 'normal' should look like - is high volume of waiters always indicative of a problem?  Or can it happen on a functionally healthy platform?

Documentation on the subject is accurate yet brief.. :)

20.4.13. Waiters

It contains the number of current read waiters.






On Wed, Sep 6, 2017 at 8:03 AM, Sean Burford <unix.gurus@gmail.com> wrote:
Hi,

Forgive the dumb question, it's been a while since I did openldap performance tuning, but why is read waiters pegged at 450?


On Sep 4, 2017 9:15 PM, "Tim" <tim@yetanother.net> wrote:
Cheers guys, 

Reassuring that I'm roughly on the right track - but that leads me into other questions relating to what I'm currently experiencing while trying to load test the platform.

I'm currently using LocustIO, with a swarm of ~70 instances spread ~25 hosts, to try scale the test traffic.

The problem I'm seeing (and hence the reason why I was questioning my initial test approach), is that the traffic seems to be artificially capping out and I can't for the life of me find the bottleneck.

I'm recording/graphing all of cn=monitor, all resources covered by vmstat and bandwidth - nothing appears to be topping out.

If I perform searches in isolation, it quickly ramps up to 20k/s and then just tabletops, while all system resources seem reasonably happy.

This happens no matter what distribution of clients I deploy (i.e. 5000 clients over 70 hosts or 100 clients over 10 hosts) - so fairly confident that the test environment is more than capable of generating further traffic.


(.. this was thrown together in a very rough and ready fashion - it's quite possible that my units may be off on some of the y-axis!)

I've performed some minor optimisations to try and resolve it (number of available file handles was my initial hope for an easy fix..) but so far, nothings helped - I still see this capping of throughput prior to the key system resources even getting slightly hot.

I had hoped that it was going to be as simple as increasing a concurrency variable within the config - but the one that does exist seems to not be valid for anything outside of legacy solaris deployments?

If anyone has any suggestions as to where I could investigate for a potential bottle neck (either on the system or within my openldap configuration) it would be very much appreciated.


Thanks in advance



On Mon, Sep 4, 2017 at 7:47 AM, Michael Ströder <michael@stroeder.com> wrote:
Tim wrote:
> I've, so far, been making use of home grown python-ldap3 scripts to
> simulate the various kinds of interactions using many parallel synchronous
> requests - but as I scale this up, I'm increasingly aware that it is a very
> different ask to simulate simple synchronous interactions compared to a
> fully optimised multithreaded client with dedicated async/sync channels and
> associated strategies.

Most clients will just send those synchronous requests. So IMHO this is the right test
pattern and you should simply make your test client multi-threaded.

> I'm currently working with a dataset of in the region of 2,500,000 objects
> and looking to test throughput up to somewhere in the region of 15k/s
> searches alongside 1k/s modification/addition events - which is beyond what
> the current basic scripts are able to achieve.

Note that the ldap3 module for Python is written in pure Python - also the ASN.1
encoding/decoding stuff. In opposite to that the old Python 2.x https://python-ldap.org
module is a C wrapper module around the OpenLDAP libs and therefore you might get a
better client performance. Nevertheless you should spread your test clients over several
machines to really achieve the needed performance.

Ciao, Michael.




--



--
Tim
tim@yetanother.net