MMR (delta-syncrepl): CPU at 100% after replication - openldap-technical

11 May 2015


      I'm building a new setup with the latest OpenLDAP built from source, using mdb, MMR delta-syncrepl over TLS. I'm using very recent sources,
I have two hosts and I'm finding that once the secondary host has synchronised with the first (this takes about 10 minutes for around 40000 entries), slapd on each of the peers remains at close to 100%. Replication is working though.
The sync logs at this point on the first system in the set (where the original data was slapadded) is showing the following entries endlessly:
554fbe2c do_syncrep2: rid=002 CSN too old, ignoring 20131221210532.737643Z#000000#001#000000 (reqStart=20150509214300.000163Z,cn=log)
contextCSN on both systems looks good. ldap1 is serverID 1, rid 1; ldap2 is serverID 2, rid 2. I guess the SID 0 comes from the original data that was imported into ldap1.
# ldap1search -s base contextCSN
dn: dc=example,dc=com
contextCSN: 20150511090001.208713Z#000000#000#000000
contextCSN: 20150511091334.137305Z#000000#001#000000
# ldap2search -s base contextCSN
dn: dc=example,dc=com
contextCSN: 20150511090001.208713Z#000000#000#000000
contextCSN: 20150511091334.137305Z#000000#001#000000
On ldap2 the stats log shows very many corresponding searches of the log DB:
5550763e conn=1000 op=11653 SRCH base="dc=example,dc=com" scope=2 deref=0 filter="(objectClass=*)"
5550763e conn=1000 op=11653 SRCH attr=* +
5550763e conn=1000 op=11653 SEARCH RESULT tag=101 err=0 nentries=0 text=
5550763e conn=1000 op=11654 SRCH base="cn=log" scope=2 deref=0 filter="(&(objectClass=auditWriteObject)(reqResult=0))"
5550763e conn=1000 op=11654 SRCH attr=reqDN reqType reqMod reqNewRDN reqDeleteOldRDN reqNewSuperior entryCSN
5550763e conn=1000 op=11655 ABANDON msg=11655
Both systems have the host name specified in the -h option to slapd. Clocks are synchronised, DNS is working etc.
I can't get to the bottom of this at all. No doubt I've made an error in my MMR config. Does anyone have a clue as to why this could be happening? I'd be very grateful for any ideas.
Here's (most of) the slapd.conf file, which is identical on both. I must admit I'm not sure whether the serverID settings are global or per-database. Moving them into the mdb section doesn't change the behaviour though.
# Server IDs for replication
serverID                        1 ldap://ldap1
serverID                        2 ldap://ldap2
#############################################################
#
# Access log database configuration
#
# This is also used for delta-syncrepl replication
#
# See slapd-accesslog(5) for details
#
#############################################################
database                        mdb
maxsize                         209715200
suffix                          cn=log
directory                       /db/ldap/accesslog
rootdn                          cn=log
rootpw                          secret
index                           entryCSN        eq
index                           objectClass     eq
index                           reqEnd          eq
index                           reqResult       eq
index                           reqStart        eq
overlay                         syncprov
syncprov-nopresent              TRUE
syncprov-reloadhint             TRUE
limits                          dn.exact="cn=replication,ou=special users,dc=example,dc=com"
                                time.soft=unlimited
                                time.hard=unlimited
                                size.soft=unlimited
                                size.hard=unlimited
# Replication user can read (not write) everything
access                          to *
                                by dn.exact="cn=replication,ou=special users,dc=example,dc=com" read
                                by * none break
#############################################################
#
# Database configuration
#
# see slapd-mdb(5) for details
#
#############################################################
database                        mdb
monitoring                      on
suffix                          dc=example,dc=com
directory                       /db/ldap/example
rootdn                          "cn=administrator,ou=special users,dc=example,dc=com"
maxsize                         209715200
# Default password hashing scheme
password-hash                   {SSHA}
# memberOf overlay provides reverse-lookups of group membership
overlay                         memberof
# sssvlv overlay provides server-side sorting
# Used mainly to allow easy sorting of uidNumber/gidNumber values
overlay                         sssvlv
sssvlv-max                      4
sssvlv-maxkeys                  5
sssvlv-maxperconn               4
# unique overlay provides attribute uniqueness
# We use this to enforce unique uidNumber/gidNumber values
overlay                         unique
unique_uri                      ldap:///ou=people,dc=example,dc=com?uidNumber?one?
unique_uri                      ldap:///ou=group,dc=example,dc=com?gidNumber?one?
### CONSUMER configuration
syncrepl
        rid=1
        provider=ldap://ldap1
        type=refreshAndPersist
        bindmethod=simple
        binddn="cn=replication,ou=special users,dc=example,dc=com"
        credentials=password
        syncdata=accesslog
        interval=00:00:00:10
        retry="20 10 60 10 120 +"
        timeout=1
        logbase="cn=log"
        searchbase="dc=example,dc=com"
        logfilter="(&(objectClass=auditWriteObject)(reqResult=0))"
        sizelimit=unlimited
        timelimit=unlimited
        schemachecking=on
        starttls=yes
syncrepl
        rid=2
        provider=ldap://ldap2
        type=refreshAndPersist
        bindmethod=simple
        binddn="cn=replication,ou=special users,dc=example,dc=com"
        credentials=password
        syncdata=accesslog
        interval=00:00:00:10
        retry="20 10 60 10 120 +"
        timeout=2
        logbase="cn=log"
        searchbase="dc=example,dc=com"
        logfilter="(&(objectClass=auditWriteObject)(reqResult=0))"
        sizelimit=unlimited
        timelimit=unlimited
        schemachecking=on
        starttls=yes
### PROVIDER configuration
overlay                         syncprov
syncprov-checkpoint             5 5
syncprov-sessionlog             50
mirrormode                      on
# Access log - used for delta-syncrepl too
overlay                         accesslog
logdb                           cn=log
logops                          writes
logold                          (objectClass=*)
logsuccess                      TRUE
logpurge                        28+00:00 1+00:00
# Allow unlimited access for replication user
limits
        dn.exact="cn=replication,ou=special users,dc=example,dc=com"
        size=unlimited
        time=unlimited
-- 
Liam Gretton                                    liam.gretton@le.ac.uk
Systems Specialist                           http://www.le.ac.uk/its/
IT Services                                   Tel: +44 (0)116 2522254
University Of Leicester, University Road
Leicestershire LE1 7RH, United Kingdom