Large scale traffic testing

List overview All Threads
Download

newer

older

Slapd.d configuration for write...

Re: When does logpurge run ?

Tim

3 Sep 2017 3 Sep '17

11:48 a.m.

Heya,

Currently working on the design for a new OpenLDAP deployment - and I'm now at the stage where I'd like to thoroughly stress test the platform in order to gain an understanding as to it's capacity and what potential bottlenecks we may hit.

I've, so far, been making use of home grown python-ldap3 scripts to simulate the various kinds of interactions using many parallel synchronous requests - but as I scale this up, I'm increasingly aware that it is a very different ask to simulate simple synchronous interactions compared to a fully optimised multithreaded client with dedicated async/sync channels and associated strategies.

I'm currently working with a dataset of in the region of 2,500,000 objects and looking to test throughput up to somewhere in the region of 15k/s searches alongside 1k/s modification/addition events - which is beyond what the current basic scripts are able to achieve.

I'm starting to resign myself to investing some significant time into recreating a more representative and capable test suit - but I'm sure that this is a problem that most new platforms must have?

So I was wondering what strategies other people have used to stress test platforms elsewhere?

Thanks in advance for any suggestions

-- Tim tim@yetanother.net

Attachments:

attachment.htm (text/html — 1.5 KB)

Show replies by date

Michael Ströder

3 Sep 3 Sep

11:47 p.m.

Tim wrote:

...

I've, so far, been making use of home grown python-ldap3 scripts to simulate the various kinds of interactions using many parallel synchronous requests - but as I scale this up, I'm increasingly aware that it is a very different ask to simulate simple synchronous interactions compared to a fully optimised multithreaded client with dedicated async/sync channels and associated strategies.

Most clients will just send those synchronous requests. So IMHO this is the right test pattern and you should simply make your test client multi-threaded.

...

I'm currently working with a dataset of in the region of 2,500,000 objects and looking to test throughput up to somewhere in the region of 15k/s searches alongside 1k/s modification/addition events - which is beyond what the current basic scripts are able to achieve.

Note that the ldap3 module for Python is written in pure Python - also the ASN.1 encoding/decoding stuff. In opposite to that the old Python 2.x https://python-ldap.org module is a C wrapper module around the OpenLDAP libs and therefore you might get a better client performance. Nevertheless you should spread your test clients over several machines to really achieve the needed performance.

Ciao, Michael.

Tim

4 Sep 4 Sep

1:59 a.m.

Cheers guys,

Reassuring that I'm roughly on the right track - but that leads me into other questions relating to what I'm currently experiencing while trying to load test the platform.

I'm currently using LocustIO, with a swarm of ~70 instances spread ~25 hosts, to try scale the test traffic.

The problem I'm seeing (and hence the reason why I was questioning my initial test approach), is that the traffic seems to be artificially capping out and I can't for the life of me find the bottleneck.

I'm recording/graphing all of cn=monitor, all resources covered by vmstat and bandwidth - nothing appears to be topping out.

If I perform searches in isolation, it quickly ramps up to 20k/s and then just tabletops, while all system resources seem reasonably happy.

This happens no matter what distribution of clients I deploy (i.e. 5000 clients over 70 hosts or 100 clients over 10 hosts) - so fairly confident that the test environment is more than capable of generating further traffic.

https://s3.eu-west-2.amazonaws.com/uninspired/mystery_bottleneck.png

(.. this was thrown together in a very rough and ready fashion - it's quite possible that my units may be off on some of the y-axis!)

I've performed some minor optimisations to try and resolve it (number of available file handles was my initial hope for an easy fix..) but so far, nothings helped - I still see this capping of throughput prior to the key system resources even getting slightly hot.

I had hoped that it was going to be as simple as increasing a concurrency variable within the config - but the one that does exist seems to not be valid for anything outside of legacy solaris deployments?

If anyone has any suggestions as to where I could investigate for a potential bottle neck (either on the system or within my openldap configuration) it would be very much appreciated.

Thanks in advance

On Mon, Sep 4, 2017 at 7:47 AM, Michael Ströder michael@stroeder.com wrote:

...

Tim wrote:

...
I've, so far, been making use of home grown python-ldap3 scripts to simulate the various kinds of interactions using many parallel

synchronous

...
requests - but as I scale this up, I'm increasingly aware that it is a

very

...
different ask to simulate simple synchronous interactions compared to a fully optimised multithreaded client with dedicated async/sync channels

and

...
associated strategies.

Most clients will just send those synchronous requests. So IMHO this is the right test pattern and you should simply make your test client multi-threaded.

...
I'm currently working with a dataset of in the region of 2,500,000

objects

...
and looking to test throughput up to somewhere in the region of 15k/s searches alongside 1k/s modification/addition events - which is beyond

what

...
the current basic scripts are able to achieve.

Note that the ldap3 module for Python is written in pure Python - also the ASN.1 encoding/decoding stuff. In opposite to that the old Python 2.x https://python-ldap.org module is a C wrapper module around the OpenLDAP libs and therefore you might get a better client performance. Nevertheless you should spread your test clients over several machines to really achieve the needed performance.

Ciao, Michael.

-- Tim tim@yetanother.net

Tim

2:09 a.m.

As always... just as you hit send to an email going to an open mailing list..

It's the bandwidth isn't it..

I'm so used to everything being a 1000mbit that I didn't spot the 100mbit limit being hit.

Will continue investigations with that additional bit of info..! :)

Thanks!

On Mon, Sep 4, 2017 at 9:59 AM, Tim tim@yetanother.net wrote:

...

Cheers guys,

Reassuring that I'm roughly on the right track - but that leads me into other questions relating to what I'm currently experiencing while trying to load test the platform.

I'm currently using LocustIO, with a swarm of ~70 instances spread ~25 hosts, to try scale the test traffic.

The problem I'm seeing (and hence the reason why I was questioning my initial test approach), is that the traffic seems to be artificially capping out and I can't for the life of me find the bottleneck.

I'm recording/graphing all of cn=monitor, all resources covered by vmstat and bandwidth - nothing appears to be topping out.

If I perform searches in isolation, it quickly ramps up to 20k/s and then just tabletops, while all system resources seem reasonably happy.

This happens no matter what distribution of clients I deploy (i.e. 5000 clients over 70 hosts or 100 clients over 10 hosts) - so fairly confident that the test environment is more than capable of generating further traffic.

https://s3.eu-west-2.amazonaws.com/uninspired/mystery_bottleneck.png

(.. this was thrown together in a very rough and ready fashion - it's quite possible that my units may be off on some of the y-axis!)

I've performed some minor optimisations to try and resolve it (number of available file handles was my initial hope for an easy fix..) but so far, nothings helped - I still see this capping of throughput prior to the key system resources even getting slightly hot.

I had hoped that it was going to be as simple as increasing a concurrency variable within the config - but the one that does exist seems to not be valid for anything outside of legacy solaris deployments?

If anyone has any suggestions as to where I could investigate for a potential bottle neck (either on the system or within my openldap configuration) it would be very much appreciated.

Thanks in advance

On Mon, Sep 4, 2017 at 7:47 AM, Michael Ströder michael@stroeder.com wrote:

...
Tim wrote:

...
I've, so far, been making use of home grown python-ldap3 scripts to simulate the various kinds of interactions using many parallel

synchronous

...
requests - but as I scale this up, I'm increasingly aware that it is a

very

...
different ask to simulate simple synchronous interactions compared to a fully optimised multithreaded client with dedicated async/sync channels

and

...
associated strategies.

Most clients will just send those synchronous requests. So IMHO this is the right test pattern and you should simply make your test client multi-threaded.

...
I'm currently working with a dataset of in the region of 2,500,000

objects

...
and looking to test throughput up to somewhere in the region of 15k/s searches alongside 1k/s modification/addition events - which is beyond

what

...
the current basic scripts are able to achieve.

Note that the ldap3 module for Python is written in pure Python - also the ASN.1 encoding/decoding stuff. In opposite to that the old Python 2.x https://python-ldap.org module is a C wrapper module around the OpenLDAP libs and therefore you might get a better client performance. Nevertheless you should spread your test clients over several machines to really achieve the needed performance.

Ciao, Michael.

-- Tim tim@yetanother.net

-- Tim tim@yetanother.net

Sean Burford

6 Sep 6 Sep

12:03 a.m.

Hi,

Forgive the dumb question, it's been a while since I did openldap performance tuning, but why is read waiters pegged at 450?

On Sep 4, 2017 9:15 PM, "Tim" tim@yetanother.net wrote:

...

Cheers guys,

Reassuring that I'm roughly on the right track - but that leads me into other questions relating to what I'm currently experiencing while trying to load test the platform.

I'm currently using LocustIO, with a swarm of ~70 instances spread ~25 hosts, to try scale the test traffic.

The problem I'm seeing (and hence the reason why I was questioning my initial test approach), is that the traffic seems to be artificially capping out and I can't for the life of me find the bottleneck.

I'm recording/graphing all of cn=monitor, all resources covered by vmstat and bandwidth - nothing appears to be topping out.

If I perform searches in isolation, it quickly ramps up to 20k/s and then just tabletops, while all system resources seem reasonably happy.

This happens no matter what distribution of clients I deploy (i.e. 5000 clients over 70 hosts or 100 clients over 10 hosts) - so fairly confident that the test environment is more than capable of generating further traffic.

https://s3.eu-west-2.amazonaws.com/uninspired/mystery_bottleneck.png

(.. this was thrown together in a very rough and ready fashion - it's quite possible that my units may be off on some of the y-axis!)

I've performed some minor optimisations to try and resolve it (number of available file handles was my initial hope for an easy fix..) but so far, nothings helped - I still see this capping of throughput prior to the key system resources even getting slightly hot.

I had hoped that it was going to be as simple as increasing a concurrency variable within the config - but the one that does exist seems to not be valid for anything outside of legacy solaris deployments?

If anyone has any suggestions as to where I could investigate for a potential bottle neck (either on the system or within my openldap configuration) it would be very much appreciated.

Thanks in advance

On Mon, Sep 4, 2017 at 7:47 AM, Michael Ströder michael@stroeder.com wrote:

...
Tim wrote:

...
I've, so far, been making use of home grown python-ldap3 scripts to simulate the various kinds of interactions using many parallel

synchronous

...
requests - but as I scale this up, I'm increasingly aware that it is a

very

...
different ask to simulate simple synchronous interactions compared to a fully optimised multithreaded client with dedicated async/sync channels

and

...
associated strategies.

Most clients will just send those synchronous requests. So IMHO this is the right test pattern and you should simply make your test client multi-threaded.

...
I'm currently working with a dataset of in the region of 2,500,000

objects

...
and looking to test throughput up to somewhere in the region of 15k/s searches alongside 1k/s modification/addition events - which is beyond

what

...
the current basic scripts are able to achieve.

Note that the ldap3 module for Python is written in pure Python - also the ASN.1 encoding/decoding stuff. In opposite to that the old Python 2.x https://python-ldap.org module is a C wrapper module around the OpenLDAP libs and therefore you might get a better client performance. Nevertheless you should spread your test clients over several machines to really achieve the needed performance.

Ciao, Michael.

-- Tim tim@yetanother.net

Tim

2:22 a.m.

Heya,

I've got to assume that it's another manifestation of the artificial bottleneck that was being introduced due to the bandwidth limitation.

Clients were connected and just sat there twiddling their digital thumbs awaiting some bandwidth with which to return results.

Still trying to establish a baseline as to what 'normal' should look like - is high volume of waiters always indicative of a problem? Or can it happen on a functionally healthy platform?

Documentation on the subject is accurate yet brief.. :)

20.4.13. Waiters

It contains the number of current read waiters.

On Wed, Sep 6, 2017 at 8:03 AM, Sean Burford unix.gurus@gmail.com wrote:

...

Hi,

Forgive the dumb question, it's been a while since I did openldap performance tuning, but why is read waiters pegged at 450?

On Sep 4, 2017 9:15 PM, "Tim" tim@yetanother.net wrote:

...
Cheers guys,

Reassuring that I'm roughly on the right track - but that leads me into other questions relating to what I'm currently experiencing while trying to load test the platform.

I'm currently using LocustIO, with a swarm of ~70 instances spread ~25 hosts, to try scale the test traffic.

The problem I'm seeing (and hence the reason why I was questioning my initial test approach), is that the traffic seems to be artificially capping out and I can't for the life of me find the bottleneck.

I'm recording/graphing all of cn=monitor, all resources covered by vmstat and bandwidth - nothing appears to be topping out.

If I perform searches in isolation, it quickly ramps up to 20k/s and then just tabletops, while all system resources seem reasonably happy.

This happens no matter what distribution of clients I deploy (i.e. 5000 clients over 70 hosts or 100 clients over 10 hosts) - so fairly confident that the test environment is more than capable of generating further traffic.

https://s3.eu-west-2.amazonaws.com/uninspired/mystery_bottleneck.png

(.. this was thrown together in a very rough and ready fashion - it's quite possible that my units may be off on some of the y-axis!)

I've performed some minor optimisations to try and resolve it (number of available file handles was my initial hope for an easy fix..) but so far, nothings helped - I still see this capping of throughput prior to the key system resources even getting slightly hot.

I had hoped that it was going to be as simple as increasing a concurrency variable within the config - but the one that does exist seems to not be valid for anything outside of legacy solaris deployments?

If anyone has any suggestions as to where I could investigate for a potential bottle neck (either on the system or within my openldap configuration) it would be very much appreciated.

Thanks in advance

On Mon, Sep 4, 2017 at 7:47 AM, Michael Ströder michael@stroeder.com wrote:

...
Tim wrote:

...
I've, so far, been making use of home grown python-ldap3 scripts to simulate the various kinds of interactions using many parallel

synchronous

...
requests - but as I scale this up, I'm increasingly aware that it is a

very

...
different ask to simulate simple synchronous interactions compared to a fully optimised multithreaded client with dedicated async/sync

channels and

...
associated strategies.

Most clients will just send those synchronous requests. So IMHO this is the right test pattern and you should simply make your test client multi-threaded.

...
I'm currently working with a dataset of in the region of 2,500,000

objects

...
and looking to test throughput up to somewhere in the region of 15k/s searches alongside 1k/s modification/addition events - which is beyond

what

...
the current basic scripts are able to achieve.

Note that the ldap3 module for Python is written in pure Python - also the ASN.1 encoding/decoding stuff. In opposite to that the old Python 2.x https://python-ldap.org module is a C wrapper module around the OpenLDAP libs and therefore you might get a better client performance. Nevertheless you should spread your test clients over several machines to really achieve the needed performance.

Ciao, Michael.

-- Tim tim@yetanother.net

-- Tim tim@yetanother.net

2860

Age (days ago)

2863

Last active (days ago)

openldap-technical@openldap.org

5 comments

3 participants

tags (0)

participants (3)

Michael Ströder
Sean Burford
Tim