performance tuning for n-way and heavy client load - openldap-technical

13 Apr 2021


      openldap 2.4.57 on 16 core OracleLinux VMs with NVME disk.
8 nodes in n-way multi master configuration, MDB backend, 50k unique DNs.
We see about 10,000 auths per minute per node.
Under heavy client load, the log shows many "deferring operation: binding"
messages in the same second. slapd is using only 400% cpu (of 1600
possible).
[2021-04-13 19:15:58] connection_input: conn=150474 deferring operation:
binding
When I write LDIFs to one node like delete user or remove user from group,
we see spikes in authentication latency metrics (what's normally .2 - .5
second response time goes up to 15-30 seconds) across all nodes in the
cluster at the same time.
What knobs can be adjusted to allow for more concurrency? It seems like
writes are impacting reads.
*slapd.conf: threads*
default is 32, tried 64 and 128 with little improvement
*slapd.conf: syncrepl*
Should I increase sessionlog size?
Should I increase checkpoint ops?
How to determine optimum values?
syncprov-checkpoint 100 5
syncprov-sessionlog 100
syncprov-reloadhint TRUE
*mdb*
maxsize 17179869184
*Indices*
index   objectClass                     eq,pres
index   cn,uid,mail,mobile              eq,pres,sub
index   o,ou,dc,preferredLanguage       eq,pres
index   member,memberUid                eq,pres
index   uidNumber,gidNumber             eq,pres
index   memberOf                        eq
index   entryUUID                       eq
index   entryCSN                        eq
index   uniqueMember                    eq
index   sAMAccountName                  eq
*ulimit*
bash-4.2$ ulimit -a
core file size          (blocks, -c) 0
data seg size           (kbytes, -d) unlimited
scheduling priority             (-e) 0
file size               (blocks, -f) unlimited
pending signals                 (-i) 482391
max locked memory       (kbytes, -l) unlimited
max memory size         (kbytes, -m) unlimited
open files                      (-n) 1048576
pipe size            (512 bytes, -p) 8
POSIX message queues     (bytes, -q) 819200
real-time priority              (-r) 0
stack size              (kbytes, -s) 8192
cpu time               (seconds, -t) unlimited
max user processes              (-u) unlimited
virtual memory          (kbytes, -v) unlimited
file locks                      (-x) unlimited
*n-way config*
serverID 1 ldap://XXXX:12389
syncrepl rid=1
 provider=ldap://XXXXXX:12389
 bindmethod=simple
 starttls=yes
 tls_cert=/opt/slapd/conf/cert.pem
 tls_cacert=/etc/pki/tls/cert.pem
 tls_key=/opt/slapd/conf/key.pem
 binddn="cn=replication_manager,dc=service-accounts,o=Root"
 credentials="YYYYYY"
 tls_reqcert=never
 searchbase=""
 schemachecking=on
 type=refreshAndPersist
 retry="60 +"
(and 7 more)
mirrormode on
Any ideas?
Thanks
-Zetan