Re: Large scale traffic testing

4 Sep 2017


      As always... just as you hit send to an email going to an open mailing
list..
It's the bandwidth isn't it..
I'm so used to everything being a 1000mbit that I didn't spot the 100mbit
limit being hit.
Will continue investigations with that additional bit of info..! :)
Thanks!
On Mon, Sep 4, 2017 at 9:59 AM, Tim tim@yetanother.net wrote:
...
Cheers guys,
Reassuring that I'm roughly on the right track - but that leads me into
other questions relating to what I'm currently experiencing while trying to
load test the platform.
I'm currently using LocustIO, with a swarm of ~70 instances spread ~25
hosts, to try scale the test traffic.
The problem I'm seeing (and hence the reason why I was questioning my
initial test approach), is that the traffic seems to be artificially
capping out and I can't for the life of me find the bottleneck.
I'm recording/graphing all of cn=monitor, all resources covered by vmstat
and bandwidth - nothing appears to be topping out.
If I perform searches in isolation, it quickly ramps up to 20k/s and then
just tabletops, while all system resources seem reasonably happy.
This happens no matter what distribution of clients I deploy (i.e. 5000
clients over 70 hosts or 100 clients over 10 hosts) - so fairly confident
that the test environment is more than capable of generating further
traffic.
https://s3.eu-west-2.amazonaws.com/uninspired/mystery_bottleneck.png
(.. this was thrown together in a very rough and ready fashion - it's
quite possible that my units may be off on some of the y-axis!)
I've performed some minor optimisations to try and resolve it (number of
available file handles was my initial hope for an easy fix..) but so far,
nothings helped - I still see this capping of throughput prior to the key
system resources even getting slightly hot.
I had hoped that it was going to be as simple as increasing a concurrency
variable within the config - but the one that does exist seems to not be
valid for anything outside of legacy solaris deployments?
If anyone has any suggestions as to where I could investigate for a
potential bottle neck (either on the system or within my openldap
configuration) it would be very much appreciated.
Thanks in advance
On Mon, Sep 4, 2017 at 7:47 AM, Michael Ströder michael@stroeder.com
wrote:
...
Tim wrote:
...
I've, so far, been making use of home grown python-ldap3 scripts to
simulate the various kinds of interactions using many parallel
synchronous
...
requests - but as I scale this up, I'm increasingly aware that it is a
very
...
different ask to simulate simple synchronous interactions compared to a
fully optimised multithreaded client with dedicated async/sync channels
and
...
associated strategies.
Most clients will just send those synchronous requests. So IMHO this is
the right test
pattern and you should simply make your test client multi-threaded.
...
I'm currently working with a dataset of in the region of 2,500,000
objects
...
and looking to test throughput up to somewhere in the region of 15k/s
searches alongside 1k/s modification/addition events - which is beyond
what
...
the current basic scripts are able to achieve.
Note that the ldap3 module for Python is written in pure Python - also
the ASN.1
encoding/decoding stuff. In opposite to that the old Python 2.x
https://python-ldap.org
module is a C wrapper module around the OpenLDAP libs and therefore you
might get a
better client performance. Nevertheless you should spread your test
clients over several
machines to really achieve the needed performance.
Ciao, Michael.
--
Tim
tim@yetanother.net
-- 
Tim
tim@yetanother.net

2025

2024

2023

2022

2021

2020

2019

2018

2017

2016

2015

2014

2013

2012

2011

2010

2009

2008

2007

Re: Large scale traffic testing