Full_Name: Jeff Wheeler Version: 2.4.20 OS: RHEL4.8 URL: ftp://ftp.openldap.org/incoming/ Submission from: (NULL) (70.103.136.179)
Under ldapsearch/modify load of approximately 160 simultaneous sessions towards the First Master only, after several hours replication falls behind and searches and modifies that previously were <1 second now take 10 seconds+.
We see also these messages continually every 5 seconds or so: Mar 10 04:02:05 auvhen2be01 slapd[29178]: => bdb_idl_insert_key: c_put id failed: DB_LOCK_DEADLOCK: Locker killed to resolve a deadlock (-30994) Mar 10 04:02:05 auvhen2be01 slapd[29178]: conn=2249 op=13995: attribute "entryCSN" index add failure
LDAP is configured in 2-way multi-master mirrormode on 2 Quad core HP blades with 72GB RAM each. Cache size set to 10GB, id2entry=18GB. The replication agreements are as such: olcSyncrepl: {0}rid=2 provider=ldap://auvhen4be05-traffic.vm.vodafone.net.au b indmethod=simple timeout=0 network-timeout=0 binddn="cn=Directory Manager,o=h 3gau" credentials="admin123" starttls=no filter="(objectclass=*)" searchbase= "o=h3gau" scope=sub schemachecking=off type=refreshAndPersist retry="60 +" .. olcOverlay: {0}syncprov olcSpCheckpoint: 100 600 .. Eventually, the syncrepl connection goes to TIME_WAIT state and replications stop. Restarting slapd does not fix and we have to reload from backup.
We tuned the following and no longer does slapd repl connection go to TIME_WAIT, however it still falls hours behind on updates and performs slowly as before. Cache size set to 20GB, olcDbIDLcacheSize: 20000000000, olcThreads: 32 Restarting slapd results in the repl connection binding, searching, but then unbinds immediately: Mar 11 03:20:57 auvhen4be05 slapd[25786]: conn=2040 fd=19 ACCEPT from IP=10.176.77.23:50798 (IP=10.176.77.47:389) Mar 11 03:20:57 auvhen4be05 slapd[25786]: conn=2040 op=0 BIND dn="cn=directory manager,o=h3gau" method=128 Mar 11 03:20:57 auvhen4be05 slapd[25786]: conn=2040 op=0 BIND dn="cn=directory manager,o=h3gau" mech=SIMPLE ssf=0 Mar 11 03:20:57 auvhen4be05 slapd[25786]: conn=2040 op=0 RESULT tag=97 err=0 text= Mar 11 03:20:57 auvhen4be05 slapd[25786]: conn=2040 op=1 SRCH base="o=h3gau" scope=2 deref=0 filter="(objectClass=*)" Mar 11 03:20:57 auvhen4be05 slapd[25786]: conn=2040 op=1 SRCH attr=* + Mar 11 03:20:57 auvhen4be05 slapd[25786]: conn=2040 op=1 SEARCH RESULT tag=101 err=0 nentries=0 text= Mar 11 03:20:57 auvhen4be05 slapd[25786]: conn=2040 op=2 UNBIND Mar 11 03:20:57 auvhen4be05 slapd[25786]: conn=2040 fd=19 closed