I am setting up a new OpenLDAP (2.4.47) cluster to migrate to from another LDAP product.
I am having problems with delta-syncrepl multi-master mirror mode replication. It seems to be fine for a while and then something happens where some of the consumers start having the dreaded “CSN too old” errors. My understanding from the docs is that this ought to get resolved automatically by syncing the whole object…but that never happens. In my testing environment, there are two producers and two read-only consumers.
In a recent example, two changes to the same object in within one minute (both change were mad against the same producer. From cn=accesslog (slightly redacted):
dn: reqStart=20190520190341.000001Z,cn=accesslog objectclass: auditModify reqdn: uid=xyz,dc=umd,dc=edu reqend: 20190520190341.000002Z reqmod: entryCSN:= 20190520190341.087987Z#000000#001#000000 reqmod: modifyTimestamp:= 20190520190341Z reqresult: 0 reqsession: 7112 reqstart: 20190520190341.000001Z reqtype: modify
dn: reqStart=20190520190418.000012Z,cn=accesslog objectclass: auditModify reqdn: uid=xyz,dc=umd,dc=edu reqend: 20190520190418.000013Z reqmod: entryCSN:= 20190520190418.183747Z#000000#001#000000 reqmod: modifyTimestamp:= 20190520190418Z reqresult: 0 reqsession: 7112 reqstart: 20190520190418.000012Z reqtype: modify
If I do a search for the entryCSN, both producers have the correct value (having processed both changes)...
dn: uid=xyz,dc=umd,dc=edu entrycsn: 20190520190418.183747Z#000000#001#000000
…but the consumers only processed the first change…
dn: uid=xyx,dc=umd,dc=edu entrycsn: 20190520190341.087987Z#000000#001#000000
…are failing the second change (from slapd.log):
do_syncrep2: rid=001 cookie=rid=001,sid=001,csn=20190520190418.183747Z#000000#001#000000 do_syncrep2: rid=001 CSN too old, ignoring 20190520190418.183747Z#000000#001#000000 (uid=xyz,,dc=umd,dc=edu)
I am beginning to despair as to why this keeps happening and how to fix it when it does (short of reloading from a dump).
And, yes, the clocks are being kept in sync.
I will be the first to admit that I have got myself all turned around trying to get the configs right…and probably have something horribly wrong. Below are the relevant bits.
Any advice is greatly appreciated.
— CONFIGS —
# # these lines are only on the producers # olcServerID: 1 ldap://producer1.umd.edu olcServerID: 2 ldap://producer2.umd.edu
dn: olcDatabase={2}mdb,cn=config objectClass: olcDatabaseConfig objectClass: olcMdbConfig olcDatabase: {2}mdb olcSuffix: dc=umd,dc=edu # # not on producer1 # olcSyncrepl: rid=1 provider=ldaps://producer1.umd.edu type=refreshAndPersist scope=sub searchbase="dc=umd,dc=edu" bindmethod=simple binddn=“SYNC_DN" credentials=“PW" schemachecking=off retry="5 5 300 +" interval=00:00:00:30 timeout=10 keepalive="240:10:30" tls_cert=/etc/openldap/certs/slapd-cert.pem tls_key=/etc/openldap/certs/slapd-key.pem tls_cacert=/etc/openldap/certs/slapd-cacert.pem tls_reqcert=never logbase="cn=accesslog" syncdata=accesslog logfilter="(&(objectClass=auditWriteObject)(reqResult=0))” # # not on producer2 # olcSyncrepl: rid=2 provider=ldaps://producer2.umd.edu type=refreshAndPersist scope=sub searchbase="dc=umd,dc=edu" bindmethod=simple binddn=“SYNC_DN" credentials=“PW" schemachecking=off retry="5 5 300 +" interval=00:00:00:30 timeout=10 keepalive="240:10:30" tls_cert=/etc/openldap/certs/slapd-cert.pem tls_key=/etc/openldap/certs/slapd-key.pem tls_cacert=/etc/openldap/certs/slapd-cacert.pem tls_reqcert=never logbase="cn=accesslog" syncdata=accesslog logfilter="(&(objectClass=auditWriteObject)(reqResult=0))" olcUpdateRef: ldaps://master.main.eng.directory.it-eng-aaa.aws.umd.edu # # on producers, this is set to TRUE # olcMirrorMode: FALSE
dn: olcOverlay={0}syncprov,olcDatabase={2}mdb,cn=config objectClass: olcOverlayConfig objectClass: olcSyncProvConfig olcOverlay: {0}syncprov olcSpCheckPoint: 100 10 olcSpSessionlog: 250000 olcSpNoPresent: TRUE olcSpReloadHint: TRUE
dn: olcOverlay={1}accesslog,olcDatabase={2}mdb,cn=config objectClass: olcOverlayConfig objectClass: olcAccessLogConfig olcOverlay: {1}accesslog olcAccessLogDB: cn=accesslog olcAccessLogOps: writes olcAccessLogSuccess: TRUE olcAccessLogPurge: 5+00:00 1+00:00
dn: olcDatabase={3}mdb,cn=config objectClass: olcDatabaseConfig objectClass: olcMdbConfig olcDatabase: {3}mdb olcSuffix: cn=accesslog olcDbDirectory: /var/lib/ldap/accesslog olcDbMaxReaders: 126 olcDbSearchStack: 16 olcDbMaxSize: 10000000000
dn: olcOverlay={0}syncprov,olcDatabase={3}mdb,cn=config objectClass: olcOverlayConfig objectClass: olcSyncProvConfig olcSpCheckPoint: 100 1 olcSpNoPresent: TRUE olcSpReloadHint: TRUE olcSpSessionlog: 250000
// John Pfeifer Division of Information Technology University of Maryland, College Park