I am setting up a new OpenLDAP (2.4.47) cluster to migrate to from another LDAP product.
I am having problems with delta-syncrepl multi-master mirror mode replication. It seems to be fine for a while and then something happens where some of the consumers start having the dreaded “CSN too old” errors. My understanding from the docs is that this ought to get resolved automatically by syncing the whole object…but that never happens. In my testing environment, there are two producers and two read-only consumers.
In a recent example, two changes to the same object in within one minute (both change were mad against the same producer. From cn=accesslog (slightly redacted):
dn: reqStart=20190520190341.000001Z,cn=accesslog
objectclass: auditModify
reqdn: uid=xyz,dc=umd,dc=edu
reqend: 20190520190341.000002Z
reqmod: entryCSN:= 20190520190341.087987Z#000000#001#000000
reqmod: modifyTimestamp:= 20190520190341Z
reqresult: 0
reqsession: 7112
reqstart: 20190520190341.000001Z
reqtype: modify
dn: reqStart=20190520190418.000012Z,cn=accesslog
objectclass: auditModify
reqdn: uid=xyz,dc=umd,dc=edu
reqend: 20190520190418.000013Z
reqmod: entryCSN:= 20190520190418.183747Z#000000#001#000000
reqmod: modifyTimestamp:= 20190520190418Z
reqresult: 0
reqsession: 7112
reqstart: 20190520190418.000012Z
reqtype: modify
If I do a search for the entryCSN, both producers have the correct value (having processed both changes)...
dn: uid=xyz,dc=umd,dc=edu
entrycsn: 20190520190418.183747Z#000000#001#000000
…but the consumers only processed the first change…
dn: uid=xyx,dc=umd,dc=edu
entrycsn: 20190520190341.087987Z#000000#001#000000
…are failing the second change (from slapd.log):
do_syncrep2: rid=001 cookie=rid=001,sid=001,csn=20190520190418.183747Z#000000#001#000000
do_syncrep2: rid=001 CSN too old, ignoring 20190520190418.183747Z#000000#001#000000 (uid=xyz,,dc=umd,dc=edu)
I am beginning to despair as to why this keeps happening and how to fix it when it does (short of reloading from a dump).
And, yes, the clocks are being kept in sync.
I will be the first to admit that I have got myself all turned around trying to get the configs right…and probably have something horribly wrong.
Below are the relevant bits.
Any advice is greatly appreciated.
— CONFIGS —
#
# these lines are only on the producers
#
olcServerID: 1 ldap://producer1.umd.edu
olcServerID: 2 ldap://producer2.umd.edu
dn: olcDatabase={2}mdb,cn=config
objectClass: olcDatabaseConfig
objectClass: olcMdbConfig
olcDatabase: {2}mdb
olcSuffix: dc=umd,dc=edu
#
# not on producer1
#
olcSyncrepl: rid=1
provider=ldaps://producer1.umd.edu
type=refreshAndPersist
scope=sub
searchbase="dc=umd,dc=edu"
bindmethod=simple
binddn=“SYNC_DN"
credentials=“PW"
schemachecking=off
retry="5 5 300 +"
interval=00:00:00:30
timeout=10
keepalive="240:10:30"
tls_cert=/etc/openldap/certs/slapd-cert.pem
tls_key=/etc/openldap/certs/slapd-key.pem
tls_cacert=/etc/openldap/certs/slapd-cacert.pem
tls_reqcert=never
logbase="cn=accesslog"
syncdata=accesslog
logfilter="(&(objectClass=auditWriteObject)(reqResult=0))”
#
# not on producer2
#
olcSyncrepl: rid=2
provider=ldaps://producer2.umd.edu
type=refreshAndPersist
scope=sub
searchbase="dc=umd,dc=edu"
bindmethod=simple
binddn=“SYNC_DN"
credentials=“PW"
schemachecking=off
retry="5 5 300 +"
interval=00:00:00:30
timeout=10
keepalive="240:10:30"
tls_cert=/etc/openldap/certs/slapd-cert.pem
tls_key=/etc/openldap/certs/slapd-key.pem
tls_cacert=/etc/openldap/certs/slapd-cacert.pem
tls_reqcert=never
logbase="cn=accesslog"
syncdata=accesslog
logfilter="(&(objectClass=auditWriteObject)(reqResult=0))"
olcUpdateRef: ldaps://master.main.eng.directory.it-eng-aaa.aws.umd.edu
#
# on producers, this is set to TRUE
#
olcMirrorMode: FALSE
dn: olcOverlay={0}syncprov,olcDatabase={2}mdb,cn=config
objectClass: olcOverlayConfig
objectClass: olcSyncProvConfig
olcOverlay: {0}syncprov
olcSpCheckPoint: 100 10
olcSpSessionlog: 250000
olcSpNoPresent: TRUE
olcSpReloadHint: TRUE
dn: olcOverlay={1}accesslog,olcDatabase={2}mdb,cn=config
objectClass: olcOverlayConfig
objectClass: olcAccessLogConfig
olcOverlay: {1}accesslog
olcAccessLogDB: cn=accesslog
olcAccessLogOps: writes
olcAccessLogSuccess: TRUE
olcAccessLogPurge: 5+00:00 1+00:00
dn: olcDatabase={3}mdb,cn=config
objectClass: olcDatabaseConfig
objectClass: olcMdbConfig
olcDatabase: {3}mdb
olcSuffix: cn=accesslog
olcDbDirectory: /var/lib/ldap/accesslog
olcDbMaxReaders: 126
olcDbSearchStack: 16
olcDbMaxSize: 10000000000
dn: olcOverlay={0}syncprov,olcDatabase={3}mdb,cn=config
objectClass: olcOverlayConfig
objectClass: olcSyncProvConfig
olcSpCheckPoint: 100 1
olcSpNoPresent: TRUE
olcSpReloadHint: TRUE
olcSpSessionlog: 250000
//
John Pfeifer
Division of Information Technology
University of Maryland, College Park