> Use ldclt from 389 project.

Thank you! This seems to be giving the best results so far (and thanks for the example as the man page isn't that accessible to someone coming at it cold!).

My python isn't great and trying to effectively implement an async test client, using python-ldap, was obviously causing artificial problems.

Initial attempts at using ldap-python were capping out at 200 binds per second - using ldclt I'm getting over 1000/s, but I'm now seeing a problem with the threads dieing fairly early on as its losing connectivity to the LDAP service.

$ cat ldaptest

ldclt -h ldapserver.lab -p 389 \

-e "bindeach,bindonly" \

-a 1000 \

-n 10 \

-D "cn=testXXXXXXX,dc=test,dc=lab" \

-w foobar \

-e "randombinddn,randombinddnlow=0000001,randombinddnhigh=0015000"

$ ./ldaptest

ldclt version 4.23

ldclt[9895]: Starting at Fri Nov 3 11:49:24 2017

ldclt[9895]: Average rate: 1225.00/thr (1225.00/sec), total: 12250

ldclt[9895]: Average rate: 1322.80/thr (1322.80/sec), total: 13228

ldclt[9895]: T000: Cannot ldap_simple_bind_s (cn=test0004533,dc=test,dc=lab, foobar), error=-1 (Can't contact LDAP server)

ldclt[9895]: T003: Cannot ldap_simple_bind_s (cn=test0004880,dc=test,dc=lab, foobar), error=-1 (Can't contact LDAP server)

ldclt[9895]: T002: Cannot ldap_simple_bind_s (cn=test0000180,dc=test,dc=lab, foobar), error=-1 (Can't contact LDAP server)

ldclt[9895]: T002: thread is dead.

ldclt[9895]: T008: Cannot ldap_simple_bind_s (cn=test0004142,dc=test,dc=lab, foobar), error=-1 (Can't contact LDAP server)

ldclt[9895]: T000: thread is dead.

ldclt[9895]: T005: Cannot ldap_simple_bind_s (cn=test0000056,dc=test,dc=lab, foobar), error=-1 (Can't contact LDAP server)

ldclt[9895]: T005: thread is dead.

ldclt[9895]: T008: thread is dead.

ldclt[9895]: T003: thread is dead.

ldclt[9895]: T001: Cannot ldap_simple_bind_s (cn=test0003193,dc=test,dc=lab, foobar), error=-1 (Can't contact LDAP server)

ldclt[9895]: T001: thread is dead.

ldclt[9895]: T009: Cannot ldap_simple_bind_s (cn=test0003687,dc=test,dc=lab, foobar), error=-1 (Can't contact LDAP server)

ldclt[9895]: T009: thread is dead.

ldclt[9895]: T006: Cannot ldap_simple_bind_s (cn=test0001082,dc=test,dc=lab, foobar), error=-1 (Can't contact LDAP server)

ldclt[9895]: T006: thread is dead.

ldclt[9895]: T004: Cannot ldap_simple_bind_s (cn=test0004994,dc=test,dc=lab, foobar), error=-1 (Can't contact LDAP server)

ldclt[9895]: T004: thread is dead.

ldclt[9895]: T007: Cannot ldap_simple_bind_s (cn=test0003764,dc=test,dc=lab, foobar), error=-1 (Can't contact LDAP server)

ldclt[9895]: T007: thread is dead.

ldclt[9895]: Average rate: 275.40/thr ( 275.40/sec), total: 2754

ldclt[9895]: Average rate: 0.00/thr ( 0.00/sec), total: 0

ldclt[9895]: All threads are dead - exit.

ldclt[9895]: T000: pendingNb=0

ldclt[9895]: T001: pendingNb=0

ldclt[9895]: T002: pendingNb=0

ldclt[9895]: T003: pendingNb=0

ldclt[9895]: T004: pendingNb=0

ldclt[9895]: T005: pendingNb=0

ldclt[9895]: T006: pendingNb=0

ldclt[9895]: T007: pendingNb=0

ldclt[9895]: T008: pendingNb=0

ldclt[9895]: T009: pendingNb=0

ldclt[9895]: Global total pending operations: 0

ldclt[9895]: Global average rate: 2823.20/thr (705.80/sec), total: 28232

ldclt[9895]: Global number times "no activity" reports: never

ldclt[9895]: Global number of dead threads: 10

ldclt[9895]: Global error -1 (Can't contact LDAP server) occurs 10 times

ldclt[9895]: Ending at Fri Nov 3 11:50:04 2017

ldclt[9895]: Exit status 4 - Cannot bind.

I'm sure a better client would simply handle the reconnection, but I'm still a bit concerned as to why this behaviour is happening.

If I run the same test on a second test host while the server is dropping the connections, it has the same problem establishing new sessions - so assume that it's a buffer/limit that's being exceeded on the server?

Within the server logs I'm seeing:

local4[22124]: daemon: read active on 12

local4[22124]: connection_read(12): input error=-2 id=156683, closing.

local4[22124]: connection_closing: readying conn=156683 sd=12 for close

local4[22124]: connection_close: deferring conn=156683 sd=12

local4[22124]: connection_resched: attempting closing conn=156683 sd=12

local4[22124]: daemon: removing 12

Which old mailing list posts seem to suggest are caused by the client not closing the connection properly?

But in this case, I have to assume the client isn't trying to close it down, and something is happening to cause the connection to be interrupted?

Does anyone have any suggestions where to look/tweak?

In the spirit of randomly changing things to see if it makes a difference, so far I've tweaked the following:

olcThreads: 512

olcTimeLimit: unlimited

olcSizeLimit: unlimited

olcSockbufMaxIncoming: 262143

olcSockbufMaxIncomingAuth: 16777215

.. and bumped up some of slapds limits.

# cat /proc/$(pidof slapd)/limits

Limit Soft Limit Hard Limit Units

Max cpu time unlimited unlimited seconds

Max file size unlimited unlimited bytes

Max data size unlimited unlimited bytes

Max stack size 8388608 unlimited bytes

Max core file size 0 unlimited bytes

Max resident set unlimited unlimited bytes

Max processes 256977 256977 processes

Max open files 40960 40960 files

Max locked memory 65536 65536 bytes

Max address space unlimited unlimited bytes

Max file locks unlimited unlimited locks

Max pending signals 256977 256977 signals

Max msgqueue size 819200 819200 bytes

Max nice priority 0 0

Max realtime priority 0 0

Max realtime timeout unlimited unlimited us

Cheers

Tim
tim@yetanother.net