--On Tuesday, May 21, 2019 5:53 PM -0400 "John C. Pfeifer" pfeifer@umd.edu wrote:
I am setting up a new OpenLDAP (2.4.47) cluster to migrate to from another LDAP product.
I am having problems with delta-syncrepl multi-master mirror mode replication. It seems to be fine for a while and then something happens where some of the consumers start having the dreaded "CSN too old" errors. My understanding from the docs is that this ought to get resolved automatically by syncing the whole object…but that never happens. In my testing environment, there are two producers and two read-only consumers.
In a recent example, two changes to the same object in within one minute (both change were mad against the same producer. From cn=accesslog (slightly redacted):
Unfortunately, you've not provided the full configurations (passwords etc redacted), which would show if additional overlays that could impact replication are in use.
…but the consumers only processed the first change…
dn: uid=xyx,dc=umd,dc=edu entrycsn: 20190520190341.087987Z#000000#001#000000
…are failing the second change (from slapd.log):
do_syncrep2: rid=001 cookie=rid=001,sid=001,csn=20190520190418.183747Z#000000#001#000000 do_syncrep2: rid=001 CSN too old, ignoring 20190520190418.183747Z#000000#001#000000 (uid=xyz,,dc=umd,dc=edu)
This would imply they processed another change in between. What is the contextCSN in the root entry on the replicas?
olcSyncrepl: rid=1 provider=ldaps://producer1.umd.edu type=refreshAndPersist scope=sub searchbase="dc=umd,dc=edu" bindmethod=simple binddn="SYNC_DN" credentials="PW" schemachecking=off retry="5 5 300 +" interval=00:00:00:30 timeout=10 keepalive="240:10:30" tls_cert=/etc/openldap/certs/slapd-cert.pem tls_key=/etc/openldap/certs/slapd-key.pem
The TLS cert/key lines are for doing client cert authentication, yet you explictly set simple binds. These lines should likely be deleted.
olcUpdateRef: ldaps://master.main.eng.directory.it-eng-aaa.aws.umd.edu
The updateref configuration is only valid on replicas and should not be present on a producer.
dn: olcOverlay={0}syncprov,olcDatabase={2}mdb,cn=config objectClass: olcOverlayConfig objectClass: olcSyncProvConfig olcOverlay: {0}syncprov olcSpCheckPoint: 100 10 olcSpSessionlog: 250000 olcSpNoPresent: TRUE olcSpReloadHint: TRUE
The above two settings are only supposed to be set TRUE on an accesslog database. These should be set to FALSE.
dn: olcDatabase={3}mdb,cn=config objectClass: olcDatabaseConfig objectClass: olcMdbConfig olcDatabase: {3}mdb olcSuffix: cn=accesslog olcDbDirectory: /var/lib/ldap/accesslog olcDbMaxReaders: 126 olcDbSearchStack: 16 olcDbMaxSize: 10000000000
You have not provided any indexing information for the accesslog DB, but it's generally mandatory to index eq several attributes.
dn: olcOverlay={0}syncprov,olcDatabase={3}mdb,cn=config objectClass: olcOverlayConfig objectClass: olcSyncProvConfig olcSpCheckPoint: 100 1 olcSpNoPresent: TRUE olcSpReloadHint: TRUE olcSpSessionlog: 250000
The checkpoint and sessionlog settings should not be set on an accesslog DB.
Additionally, without the specific syncrepl configurations from the consumers (do they listen to both? do they only listen to one?) there's additional levels of variability. If they listen to both masters, then it wouldn't be uncommon for them to receive the change from master A and ignore the change from master B.
With so much redacted from both the config and the change operations, it makes it fairly difficult to comment further. For example, you could be a victim of ITS#8990, but you would have to provide unredacted results of the two sets of changes as they appear in the accesslog DBs from both masters (In ITS#8990, a change propagates correctly between to MMR servers, but is written incorrectly into the accesslog DB of one of the masters).
--Quanah
--
Quanah Gibson-Mount Product Architect Symas Corporation Packaged, certified, and supported LDAP solutions powered by OpenLDAP: http://www.symas.com