I am setting up a new OpenLDAP (2.4.47) cluster to migrate to from another LDAP product.
I am having problems with delta-syncrepl multi-master mirror mode replication. It seems to be fine for a while and then something happens where some of the consumers start having the dreaded “CSN too old” errors. My understanding from the docs is that this ought to get resolved automatically by syncing the whole object…but that never happens. In my testing environment, there are two producers and two read-only consumers.
In a recent example, two changes to the same object in within one minute (both change were mad against the same producer. From cn=accesslog (slightly redacted):
dn: reqStart=20190520190341.000001Z,cn=accesslog objectclass: auditModify reqdn: uid=xyz,dc=umd,dc=edu reqend: 20190520190341.000002Z reqmod: entryCSN:= 20190520190341.087987Z#000000#001#000000 reqmod: modifyTimestamp:= 20190520190341Z reqresult: 0 reqsession: 7112 reqstart: 20190520190341.000001Z reqtype: modify
dn: reqStart=20190520190418.000012Z,cn=accesslog objectclass: auditModify reqdn: uid=xyz,dc=umd,dc=edu reqend: 20190520190418.000013Z reqmod: entryCSN:= 20190520190418.183747Z#000000#001#000000 reqmod: modifyTimestamp:= 20190520190418Z reqresult: 0 reqsession: 7112 reqstart: 20190520190418.000012Z reqtype: modify
If I do a search for the entryCSN, both producers have the correct value (having processed both changes)...
dn: uid=xyz,dc=umd,dc=edu entrycsn: 20190520190418.183747Z#000000#001#000000
…but the consumers only processed the first change…
dn: uid=xyx,dc=umd,dc=edu entrycsn: 20190520190341.087987Z#000000#001#000000
…are failing the second change (from slapd.log):
do_syncrep2: rid=001 cookie=rid=001,sid=001,csn=20190520190418.183747Z#000000#001#000000 do_syncrep2: rid=001 CSN too old, ignoring 20190520190418.183747Z#000000#001#000000 (uid=xyz,,dc=umd,dc=edu)
I am beginning to despair as to why this keeps happening and how to fix it when it does (short of reloading from a dump).
And, yes, the clocks are being kept in sync.
I will be the first to admit that I have got myself all turned around trying to get the configs right…and probably have something horribly wrong. Below are the relevant bits.
Any advice is greatly appreciated.
— CONFIGS —
# # these lines are only on the producers # olcServerID: 1 ldap://producer1.umd.edu olcServerID: 2 ldap://producer2.umd.edu
dn: olcDatabase={2}mdb,cn=config objectClass: olcDatabaseConfig objectClass: olcMdbConfig olcDatabase: {2}mdb olcSuffix: dc=umd,dc=edu # # not on producer1 # olcSyncrepl: rid=1 provider=ldaps://producer1.umd.edu type=refreshAndPersist scope=sub searchbase="dc=umd,dc=edu" bindmethod=simple binddn=“SYNC_DN" credentials=“PW" schemachecking=off retry="5 5 300 +" interval=00:00:00:30 timeout=10 keepalive="240:10:30" tls_cert=/etc/openldap/certs/slapd-cert.pem tls_key=/etc/openldap/certs/slapd-key.pem tls_cacert=/etc/openldap/certs/slapd-cacert.pem tls_reqcert=never logbase="cn=accesslog" syncdata=accesslog logfilter="(&(objectClass=auditWriteObject)(reqResult=0))” # # not on producer2 # olcSyncrepl: rid=2 provider=ldaps://producer2.umd.edu type=refreshAndPersist scope=sub searchbase="dc=umd,dc=edu" bindmethod=simple binddn=“SYNC_DN" credentials=“PW" schemachecking=off retry="5 5 300 +" interval=00:00:00:30 timeout=10 keepalive="240:10:30" tls_cert=/etc/openldap/certs/slapd-cert.pem tls_key=/etc/openldap/certs/slapd-key.pem tls_cacert=/etc/openldap/certs/slapd-cacert.pem tls_reqcert=never logbase="cn=accesslog" syncdata=accesslog logfilter="(&(objectClass=auditWriteObject)(reqResult=0))" olcUpdateRef: ldaps://master.main.eng.directory.it-eng-aaa.aws.umd.edu # # on producers, this is set to TRUE # olcMirrorMode: FALSE
dn: olcOverlay={0}syncprov,olcDatabase={2}mdb,cn=config objectClass: olcOverlayConfig objectClass: olcSyncProvConfig olcOverlay: {0}syncprov olcSpCheckPoint: 100 10 olcSpSessionlog: 250000 olcSpNoPresent: TRUE olcSpReloadHint: TRUE
dn: olcOverlay={1}accesslog,olcDatabase={2}mdb,cn=config objectClass: olcOverlayConfig objectClass: olcAccessLogConfig olcOverlay: {1}accesslog olcAccessLogDB: cn=accesslog olcAccessLogOps: writes olcAccessLogSuccess: TRUE olcAccessLogPurge: 5+00:00 1+00:00
dn: olcDatabase={3}mdb,cn=config objectClass: olcDatabaseConfig objectClass: olcMdbConfig olcDatabase: {3}mdb olcSuffix: cn=accesslog olcDbDirectory: /var/lib/ldap/accesslog olcDbMaxReaders: 126 olcDbSearchStack: 16 olcDbMaxSize: 10000000000
dn: olcOverlay={0}syncprov,olcDatabase={3}mdb,cn=config objectClass: olcOverlayConfig objectClass: olcSyncProvConfig olcSpCheckPoint: 100 1 olcSpNoPresent: TRUE olcSpReloadHint: TRUE olcSpSessionlog: 250000
// John Pfeifer Division of Information Technology University of Maryland, College Park
--On Tuesday, May 21, 2019 5:53 PM -0400 "John C. Pfeifer" pfeifer@umd.edu wrote:
I am setting up a new OpenLDAP (2.4.47) cluster to migrate to from another LDAP product.
I am having problems with delta-syncrepl multi-master mirror mode replication. It seems to be fine for a while and then something happens where some of the consumers start having the dreaded "CSN too old" errors. My understanding from the docs is that this ought to get resolved automatically by syncing the whole object…but that never happens. In my testing environment, there are two producers and two read-only consumers.
In a recent example, two changes to the same object in within one minute (both change were mad against the same producer. From cn=accesslog (slightly redacted):
Unfortunately, you've not provided the full configurations (passwords etc redacted), which would show if additional overlays that could impact replication are in use.
…but the consumers only processed the first change…
dn: uid=xyx,dc=umd,dc=edu entrycsn: 20190520190341.087987Z#000000#001#000000
…are failing the second change (from slapd.log):
do_syncrep2: rid=001 cookie=rid=001,sid=001,csn=20190520190418.183747Z#000000#001#000000 do_syncrep2: rid=001 CSN too old, ignoring 20190520190418.183747Z#000000#001#000000 (uid=xyz,,dc=umd,dc=edu)
This would imply they processed another change in between. What is the contextCSN in the root entry on the replicas?
olcSyncrepl: rid=1 provider=ldaps://producer1.umd.edu type=refreshAndPersist scope=sub searchbase="dc=umd,dc=edu" bindmethod=simple binddn="SYNC_DN" credentials="PW" schemachecking=off retry="5 5 300 +" interval=00:00:00:30 timeout=10 keepalive="240:10:30" tls_cert=/etc/openldap/certs/slapd-cert.pem tls_key=/etc/openldap/certs/slapd-key.pem
The TLS cert/key lines are for doing client cert authentication, yet you explictly set simple binds. These lines should likely be deleted.
olcUpdateRef: ldaps://master.main.eng.directory.it-eng-aaa.aws.umd.edu
The updateref configuration is only valid on replicas and should not be present on a producer.
dn: olcOverlay={0}syncprov,olcDatabase={2}mdb,cn=config objectClass: olcOverlayConfig objectClass: olcSyncProvConfig olcOverlay: {0}syncprov olcSpCheckPoint: 100 10 olcSpSessionlog: 250000 olcSpNoPresent: TRUE olcSpReloadHint: TRUE
The above two settings are only supposed to be set TRUE on an accesslog database. These should be set to FALSE.
dn: olcDatabase={3}mdb,cn=config objectClass: olcDatabaseConfig objectClass: olcMdbConfig olcDatabase: {3}mdb olcSuffix: cn=accesslog olcDbDirectory: /var/lib/ldap/accesslog olcDbMaxReaders: 126 olcDbSearchStack: 16 olcDbMaxSize: 10000000000
You have not provided any indexing information for the accesslog DB, but it's generally mandatory to index eq several attributes.
dn: olcOverlay={0}syncprov,olcDatabase={3}mdb,cn=config objectClass: olcOverlayConfig objectClass: olcSyncProvConfig olcSpCheckPoint: 100 1 olcSpNoPresent: TRUE olcSpReloadHint: TRUE olcSpSessionlog: 250000
The checkpoint and sessionlog settings should not be set on an accesslog DB.
Additionally, without the specific syncrepl configurations from the consumers (do they listen to both? do they only listen to one?) there's additional levels of variability. If they listen to both masters, then it wouldn't be uncommon for them to receive the change from master A and ignore the change from master B.
With so much redacted from both the config and the change operations, it makes it fairly difficult to comment further. For example, you could be a victim of ITS#8990, but you would have to provide unredacted results of the two sets of changes as they appear in the accesslog DBs from both masters (In ITS#8990, a change propagates correctly between to MMR servers, but is written incorrectly into the accesslog DB of one of the masters).
--Quanah
--
Quanah Gibson-Mount Product Architect Symas Corporation Packaged, certified, and supported LDAP solutions powered by OpenLDAP: http://www.symas.com
Quanah Gibson-Mount quanah@symas.com schrieb am 22.05.2019 um 01:22 in
Nachricht <7CB2BA3B4C501BA796399E09@[192.168.1.39]>:
[...]
The updateref configuration is only valid on replicas and should not be present on a producer.
[...]
Reading that, I read an older manual page; it says: olcUpdateDN: <dn> This option is only applicable in a slave database. It specifies the DN permitted to update (subject to access controls) the replica. It is only needed in certain push-mode replication scenarios. Generally, this DN should not be the same as the rootdn used at the master.
olcUpdateRef: <url> Specify the referral to pass back when slapd(8) is asked to modify a replicated local database. If multiple values are specified, each url is provided.
So I guess some of the comments for oldUpdateDN also apply to olcUpdateRef, but it's not obvious for the user. In case the current manual page still isn't that explicit about that, I suggest to change it to match the intended semantics.
Regards, Ulrich
All four servers (master & replica) show the same contextCSN values:
dn: dc=umd,dc=edu contextcsn: 20190522130320.956183Z#000000#001#000000 contextcsn: 20190521122959.509696Z#000000#002#000000
On May 21, 2019, at 7:22 PM, Quanah Gibson-Mount quanah@symas.com wrote:
…but the consumers only processed the first change…
dn: uid=xyx,dc=umd,dc=edu entrycsn: 20190520190341.087987Z#000000#001#000000
…are failing the second change (from slapd.log):
do_syncrep2: rid=001 cookie=rid=001,sid=001,csn=20190520190418.183747Z#000000#001#000000 do_syncrep2: rid=001 CSN too old, ignoring 20190520190418.183747Z#000000#001#000000 (uid=xyz,,dc=umd,dc=edu)
This would imply they processed another change in between. What is the contextCSN in the root entry on the replicas?
// John Pfeifer Division of Information Technology University of Maryland, College Park
--On Wednesday, May 22, 2019 3:54 PM -0400 "John C. Pfeifer" pfeifer@umd.edu wrote:
All four servers (master & replica) show the same contextCSN values:
dn: dc=umd,dc=edu contextcsn: 20190522130320.956183Z#000000#001#000000
20190520190418.183747Z#000000#001#000000 (uid=xyz,,dc=umd,dc=edu)
So it looks like they've been processing changes. What makes you think there is an error?
You didn't answer a number of my other questions which would have been helpful, such as this bit I noted before:
"Additionally, without the specific syncrepl configurations from the consumers (do they listen to both? do they only listen to one?) there's additional levels of variability. If they listen to both masters, then it wouldn't be uncommon for them to receive the change from master A and ignore the change from master B."
--Quanah
--
Quanah Gibson-Mount Product Architect Symas Corporation Packaged, certified, and supported LDAP solutions powered by OpenLDAP: http://www.symas.com
On May 21, 2019, at 7:22 PM, Quanah Gibson-Mount quanah@symas.com wrote:
Additionally, without the specific syncrepl configurations from the consumers (do they listen to both? do they only listen to one?) there's additional levels of variability. If they listen to both masters, then it wouldn't be uncommon for them to receive the change from master A and ignore the change from master B.
They are listening to both masters. The problem is not so much the “error” message in the log as the fact that they haven’t processed the change from either master and so the objects are out of sync.
With so much redacted from both the config and the change operations, it makes it fairly difficult to comment further. For example, you could be a victim of ITS#8990, but you would have to provide unredacted results of the two sets of changes as they appear in the accesslog DBs from both masters (In ITS#8990, a change propagates correctly between to MMR servers, but is written incorrectly into the accesslog DB of one of the masters).
I have just read ITS#8990. While that is a concern for us, that did not come into play in the particular example that I cited (values were being added to an attribute).
Having made that changes that you recommended, the config now looks like:
# # See slapd-config(5) for details on configuration options. # This file should NOT be world readable. # dn: cn=config objectClass: olcGlobal cn: config # # Set the serverId giving explict producer URL’s — these lines only on producers # olcServerID: 1 ldap://producer1.umd.edu olcServerID: 2 ldap://producer2.umd.edu # # # Set the args & pid files # olcArgsFile: /var/run/openldap/slapd.args olcPidFile: /var/run/openldap/slapd.pid # # Set the idle timeout # olcIdleTimeout: 300 # # Sample security restrictions # Require integrity protection (prevent hijacking) # Require 112-bit (3DES or better) encryption for updates # Require 64-bit encryption for simple bind #olcSecurity: ssf=1 update_ssf=112 simple_bind=64 # olcSaslSecProps: noanonymous,passcred # # TLS configuration olcTLSCACertificatePath: /local/ssl/certs/ca olcTLSCACertificateFile: /etc/openldap/certs/slapd-cacert.pem olcTLSCertificateFile: /etc/openldap/certs/slapd-cert.pem olcTLSCertificateKeyFile: /etc/openldap/certs/slapd-key.pem olcTLSVerifyClient: never # olcRootDSE: /etc/openldap/root_dse.ldif # # # For ProxyAuthorization olcAuthzPolicy: to # # # Threading configuration olcToolThreads: 2 # olcLogLevel: stats sync
# # Load dynamic backend modules: # dn: cn=module,cn=config objectClass: olcModuleList cn: module olcModulepath: /usr/lib64/openldap olcModuleload: accesslog.la olcModuleload: auditlog.la #olcModuleload: back_dnssrv.la #olcModuleload: back_ldap.la olcModuleload: back_mdb.la #olcModuleload: back_meta.la #olcModuleload: back_null.la #olcModuleload: back_passwd.la #olcModuleload: back_relay.la #olcModuleload: back_shell.la #olcModuleload: back_sock.la #olcModuleload: collect.la #olcModuleload: constraint.la #olcModuleload: dds.la #olcModuleload: deref.la #olcModuleload: dyngroup.la olcModuleload: dynlist.la olcModuleload: memberof.la #olcModuleload: pcache.la olcModuleload: ppolicy.la #olcModuleload: refint.la #olcModuleload: retcode.la #olcModuleload: rwm.la #olcModuleload: seqmod.la #olcModuleload: smbk5pwd.la #olcModuleload: sssvlv.la olcModuleload: syncprov.la #olcModuleload: translucent.la #olcModuleload: unique.la #olcModuleload: valsort.la # olcModuleload: pw-sha2.la
dn: cn=schema,cn=config objectClass: olcSchemaConfig cn: schema
include: file:///etc/openldap/schema/core.ldif include: file:///etc/openldap/schema/cosine.ldif include: file:///etc/openldap/schema/inetorgperson.ldif include: file:///etc/openldap/schema/nis.ldif include: file:///etc/openldap/schema/dyngroup.ldif include: file:///etc/openldap/schema/ppolicy.ldif include: file:///etc/openldap/schema/other.ldif include: file:///etc/openldap/schema/eduPerson-201602.ldif include: file:///etc/openldap/schema/umPerson.ldif include: file:///etc/openldap/schema/umGeneric.ldif include: file:///etc/openldap/schema/attributeSet.ldif
# Frontend settings # dn: olcDatabase=frontend,cn=config objectClass: olcDatabaseConfig objectClass: olcFrontendConfig olcDatabase: frontend # # Sample global access control policy: # Root DSE: allow anyone to read it # Subschema (sub)entry DSE: allow anyone to read it # Other DSEs: # Allow self write access # Allow authenticated users read access # Allow anonymous users to authenticate # #olcAccess: to dn.base="" by * read #olcAccess: to dn.base="cn=Subschema" by * read #olcAccess: to * # by self write # by users read # by anonymous auth # # if no access controls are present, the default policy # allows anyone and everyone to read anything but restricts # updates to rootdn. (e.g., "access to * by * read") # # rootdn can always read and write EVERYTHING! # olcSizeLimit: 50 olcTimeLimit: 240 # # INSERT-ACL frontend
# Config settings # dn: olcDatabase={0}config,cn=config objectClass: olcDatabaseConfig olcDatabase: {0}config olcAccess: {0}to dn.sub=cn=config by group.exact=cn=directory-admin,cn=groups,ou=ldap,dc=umd,dc=edu read by * none olcAddContentAcl: TRUE olcLastMod: TRUE olcMaxDerefDepth: 15 olcReadOnly: FALSE olcSyncUseSubentry: FALSE olcMonitoring: FALSE olcRootDN: cn=admin,cn=config olcRootPW: ADMIN_PW
dn: olcDatabase={1}monitor,cn=config objectClass: olcDatabaseConfig objectClass: olcMonitorConfig olcDatabase: {1}monitor olcAccess: {0}to dn.sub=cn=monitor,cn=config by dn=“cn=admin,dc=umd,dc=edu" read by group.exact=cn=directory-admin,cn=groups,ou=ldap,dc=umd,dc=edu read olcLimits: group/groupOfNames/member="cn=directory-admin,cn=groups,ou=ldap,dc=umd,dc=edu" size=unlimited time=unlimited
####################################################################### # LMDB database definitions ####################################################################### # dn: olcDatabase={2}mdb,cn=config objectClass: olcDatabaseConfig objectClass: olcMdbConfig olcDatabase: {2}mdb olcSuffix: dc=umd,dc=edu olcRootDN: cn=admin,dc=umd,dc=edu # Cleartext passwords, especially for the rootdn, should # be avoided. See slappasswd(8) and slapd-config(5) for details. # Use of strong authentication encouraged. olcRootPW: ADMIN_PW # The database directory MUST exist prior to running slapd AND # should only be accessible by the slapd and slap tools. # Mode 700 recommended. olcDbDirectory: /var/lib/ldap/umd-edu olcDbMaxReaders: 126 olcDbSearchStack: 16 olcDbMaxSize: 10000000000 # Indices to maintain olcDbIndex: cn eq,approx,sub olcDbIndex: eduPersonPrincipalName eq olcDbIndex: employeeNumber eq olcDbIndex: givenName eq,sub olcDbIndex: mail pres,eq olcDbIndex: member eq olcDbIndex: memberOf eq olcDbIndex: objectClass pres,eq olcDbIndex: ou pres,eq,sub olcDbIndex: sn eq,approx,sub olcDbIndex: uid eq,sub olcDbIndex: umAccountType eq,sub olcDbIndex: umAdminId eq olcDbIndex: umAffiliate eq olcDbIndex: umAlternateMail eq olcDbIndex: umEmployee eq olcDbIndex: umExpirationdate eq olcDbIndex: umGroup pres,eq olcDbIndex: umId eq olcDbIndex: umInactiveDate pres,eq olcDbIndex: umInstitution eq olcDbIndex: umInstitutionActive eq olcDbIndex: umLibraryBarcode eq olcDbIndex: umMailAlias pres,eq olcDbIndex: umMailFwd pres,eq olcDbIndex: umNameComponent eq,sub olcDbIndex: umNickname eq,sub olcDbIndex: umOwnerId eq olcDbIndex: umRegcourse eq olcDbIndex: umRegStatus eq olcDbIndex: umServices eq olcDbIndex: umServiceStatus pres,eq olcDbIndex: umStudentStatus pres,eq # Indices for syncrepl olcDbIndex: entryUUID eq olcDbIndex: entryCSN eq # # Search Limits # olcLimits: group/groupOfNames/member="cn=directory-admin,cn=groups,ou=ldap,dc=umd,dc=edu" size=unlimited time=unlimited olcLimits: group/groupOfNames/member="cn=replciation-auth,cn=groups,ou=ldap,dc=umd,dc=edu" size=unlimited time=unlimited olcLimits: group/groupOfNames/member="cn=search-unlimited,cn=groups,ou=ldap,dc=umd,dc=edu" size=5000 size.pr=5000 size.prtotal=unlimited time=unlimited olcLimits: group/groupOfNames/member="cn=search-limit-3000,cn=groups,ou=ldap,dc=umd,dc=edu" size=3000 time=300 olcLimits: group/groupOfNames/member="cn=search-limit-2000,cn=groups,ou=ldap,dc=umd,dc=edu" size=2000 time=300 olcLimits: group/groupOfNames/member="cn=search-limit-5,cn=groups,ou=ldap,dc=umd,dc=edu" size=5 time=300 olcLimits: dn.subtree="cn=auth,ou=ldap,dc=umd,dc=edu" size=1000 time=300 olcLimits: dn.subtree="ou=people,dc=umd,dc=edu" size=100 size.prtotal=disabled time.soft=30 time.hard=240 olcLimits: anonymous size=50 time.soft=30 size.prtotal=disabled time.hard=240 # # INSERT-ACL umd # # # Replication Configs # # The rid=1 block is not on producer1 # The rid=2 block is not on producer2 # olcSyncrepl: rid=1 provider=ldaps://producer1.umd.edu type=refreshAndPersist scope=sub searchbase="dc=umd,dc=edu" bindmethod=simple binddn="uid=sync.consumer,dc=umd,dc=edu" credentials=“SYNC_PW" schemachecking=off retry="5 5 300 +" interval=00:00:00:30 timeout=10 keepalive="240:10:30" tls_cacert=/etc/openldap/certs/slapd-cacert.pem tls_reqcert=never logbase="cn=accesslog" syncdata=accesslog logfilter="(&(objectClass=auditWriteObject)(reqResult=0))" olcSyncrepl: rid=2 provider=ldaps://useast1c-openldap-eng-main1.it-eng-aaa.aws.umd.edu type=refreshAndPersist scope=sub searchbase="dc=umd,dc=edu" bindmethod=simple binddn="uid=sync.consumer,dc=umd,dc=edu" credentials="SYNC_PW" schemachecking=off retry="5 5 300 +" interval=00:00:00:30 timeout=10 keepalive="240:10:30" tls_cacert=/etc/openldap/certs/slapd-cacert.pem tls_reqcert=never logbase="cn=accesslog" syncdata=accesslog logfilter="(&(objectClass=auditWriteObject)(reqResult=0))” # # These lines only on consumers # olcUpdateRef: ldaps://master.directory.umd.edu olcMirrorMode: FALSE # # This line only on producers # olcMirrorMode: TRUE
dn: olcOverlay={1}accesslog,olcDatabase={2}mdb,cn=config objectClass: olcOverlayConfig objectClass: olcAccessLogConfig olcOverlay: {1}accesslog olcAccessLogDB: cn=accesslog olcAccessLogOps: writes olcAccessLogSuccess: TRUE olcAccessLogPurge: 5+00:00 1+00:00
dn: olcOverlay={2}dynlist,olcDatabase={2}mdb,cn=config objectClass: olcOverlayConfig objectClass: olcDynamicList olcOverlay: {2}dynlist olcDlAttrSet: {0}groupOfURLs memberURL member olcDlAttrSet: {1}umGroupOfURLs memberURL member
dn: olcOverlay={3}ppolicy,olcDatabase={2}mdb,cn=config objectClass: olcOverlayConfig objectClass: olcPPolicyConfig olcOverlay: {3}ppolicy olcPPolicyDefault: cn=default,ou=policies,ou=ldap,dc=umd,dc=edu olcPPolicyHashCleartext: TRUE olcPPolicyUseLockout: FALSE olcPPolicyForwardUpdates: FALSE
dn: olcOverlay={5}memberof,olcDatabase={2}mdb,cn=config objectClass: olcOverlayConfig objectClass: olcMemberOf olcOverlay: {5}memberof olcMemberOfDangling: ignore olcMemberOfRefInt: FALSE olcMemberOfGroupOC: groupOfNames olcMemberOfMemberAD: member olcMemberOfMemberOfAD: memberOf
dn: olcOverlay={6}memberof,olcDatabase={2}mdb,cn=config objectClass: olcOverlayConfig objectClass: olcMemberOf olcOverlay: {6}memberof olcMemberOfDangling: ignore olcMemberOfRefInt: FALSE olcMemberOfGroupOC: umGroupOfNames olcMemberOfMemberAD: member olcMemberOfMemberOfAD: memberOf
dn: olcDatabase={3}mdb,cn=config objectClass: olcDatabaseConfig objectClass: olcMdbConfig olcDatabase: {3}mdb olcSuffix: cn=accesslog olcDbDirectory: /var/lib/ldap/accesslog olcDbMaxReaders: 126 olcDbSearchStack: 16 olcDbMaxSize: 10000000000 olcRootDN: cn=admin,cn=accesslog olcRootPW: ADMIN_PW # # INSERT-ACL accesslog # olcLimits: * size=unlimited time=unlimited olcDbIndex: objectClass eq olcDbIndex: reqStart eq olcDbIndex: reqType eq olcDbIndex: reqAuthzId eq olcDbIndex: reqDN eq olcDbIndex: reqEnd eq olcDbIndex: reqResult eq # Indices for syncrepl olcDbIndex: entryUUID eq olcDbIndex: entryCSN eq
#### END ####
I have omitted the voluminous access rules (sometimes being at a university lead to really squirrelly policies) but I don’t believe that they are part of the problem since replication will work completely fine for a while before getting sideways on some, but not necessarily all, of the consumers.
// John Pfeifer Division of Information Technology University of Maryland, College Park
--On Friday, May 24, 2019 9:23 AM -0400 "John C. Pfeifer" pfeifer@umd.edu wrote:
With so much redacted from both the config and the change operations, it makes it fairly difficult to comment further. For example, you could be a victim of ITS#8990, but you would have to provide unredacted results of the two sets of changes as they appear in the accesslog DBs from both masters (In ITS#8990, a change propagates correctly between to MMR servers, but is written incorrectly into the accesslog DB of one of the masters).
I have just read ITS#8990. While that is a concern for us, that did not come into play in the particular example that I cited (values were being added to an attribute).
Adding values to an attribute is exactly what ITS#8990 was dealing with, so I'm not sure you think it's not a concern. The issue was with the way in which the attribute was modified.
Having made that changes that you recommended, the config now looks like:
No syncprov config?
--Quanah
--
Quanah Gibson-Mount Product Architect Symas Corporation Packaged, certified, and supported LDAP solutions powered by OpenLDAP: http://www.symas.com
--On Wednesday, May 22, 2019 10:14 AM +0200 Ulrich Windl Ulrich.Windl@rz.uni-regensburg.de wrote:
olcUpdateRef: <url>
Specify the referral to pass back when slapd(8) is
asked to modify a replicated local database. If multiple values are specified, each url is provided.
So I guess some of the comments for oldUpdateDN also apply to olcUpdateRef, but it's not obvious for the user. In case the current manual page still isn't that explicit about that, I suggest to change it to match the intended semantics.
It literally states that it only applies with a replicated local database already.
--Quanah
--
Quanah Gibson-Mount Product Architect Symas Corporation Packaged, certified, and supported LDAP solutions powered by OpenLDAP: http://www.symas.com
On May 24, 2019, at 2:15 PM, Quanah Gibson-Mount quanah@symas.com wrote:
--On Friday, May 24, 2019 9:23 AM -0400 "John C. Pfeifer" pfeifer@umd.edu wrote:
With so much redacted from both the config and the change operations, it makes it fairly difficult to comment further. For example, you could be a victim of ITS#8990, but you would have to provide unredacted results of the two sets of changes as they appear in the accesslog DBs from both masters (In ITS#8990, a change propagates correctly between to MMR servers, but is written incorrectly into the accesslog DB of one of the masters).
I have just read ITS#8990. While that is a concern for us, that did not come into play in the particular example that I cited (values were being added to an attribute).
Adding values to an attribute is exactly what ITS#8990 was dealing with, so I'm not sure you think it's not a concern. The issue was with the way in which the attribute was modified.
Looking at http://www.openldap.org/its/index.cgi/Software%20Bugs?id=8990;selectid=8990, ITS#8990 seems to be about deleting values, not adding them. Is there somewhere else that I should be looking for more details?
Having made that changes that you recommended, the config now looks like:
No syncprov config?
Sorry, the config I copied was from a replica; on the masters only:
dn: olcOverlay={0}syncprov,olcDatabase={2}mdb,cn=config objectClass: olcOverlayConfig objectClass: olcSyncProvConfig olcOverlay: {0}syncprov olcSpCheckPoint: 100 10 olcSpSessionlog: 250000 olcSpNoPresent: FALSE olcSpReloadHint: FALSE
dn: olcOverlay={0}syncprov,olcDatabase={3}mdb,cn=config objectClass: olcOverlayConfig objectClass: olcSyncProvConfig olcSpNoPresent: TRUE olcSpReloadHint: TRUE
--Quanah
--
Quanah Gibson-Mount Product Architect Symas Corporation Packaged, certified, and supported LDAP solutions powered by OpenLDAP: http://www.symas.com
// John Pfeifer Division of Information Technology University of Maryland, College Park
--On Friday, May 24, 2019 5:08 PM -0400 "John C. Pfeifer" pfeifer@umd.edu wrote:
Adding values to an attribute is exactly what ITS#8990 was dealing with, so I'm not sure you think it's not a concern. The issue was with the way in which the attribute was modified.
Looking at http://www.openldap.org/its/index.cgi/Software%20Bugs?id=8990;selectid=89 90, ITS#8990 seems to be about deleting values, not adding them. Is there somewhere else that I should be looking for more details?
The issue is how an attribute value is added, particularly when Python is being used to do the change. In any case, the underlying problem affects multiple operations. Again, the accesslog entry for the change from *both* masters must be compared. The issue in ITS#8990 goes something like this:
master A receives the change master B successfully replicates the change, but records it incorrectly in the accesslog.
downstream replica pulls the change from master B, and it fails.
--Quanah
--
Quanah Gibson-Mount Product Architect Symas Corporation Packaged, certified, and supported LDAP solutions powered by OpenLDAP: http://www.symas.com
Just to close this thread…
After applying the ITS#8990 patch and making the recommended configuration changes, replication seems to be behaving itself even after a weekend of heavy updates to LDAP.
Thanks for all of the advice.
On May 24, 2019, at 4:45 PM, Quanah Gibson-Mount quanah@symas.com wrote:
--On Friday, May 24, 2019 5:08 PM -0400 "John C. Pfeifer" pfeifer@umd.edu wrote:
Adding values to an attribute is exactly what ITS#8990 was dealing with, so I'm not sure you think it's not a concern. The issue was with the way in which the attribute was modified.
Looking at http://www.openldap.org/its/index.cgi/Software%20Bugs?id=8990;selectid=89 90, ITS#8990 seems to be about deleting values, not adding them. Is there somewhere else that I should be looking for more details?
The issue is how an attribute value is added, particularly when Python is being used to do the change. In any case, the underlying problem affects multiple operations. Again, the accesslog entry for the change from *both* masters must be compared. The issue in ITS#8990 goes something like this:
master A receives the change master B successfully replicates the change, but records it incorrectly in the accesslog.
downstream replica pulls the change from master B, and it fails.
--Quanah
--
Quanah Gibson-Mount Product Architect Symas Corporation Packaged, certified, and supported LDAP solutions powered by OpenLDAP: http://www.symas.com
// John Pfeifer Division of Information Technology University of Maryland, College Park
--On Monday, June 03, 2019 5:06 PM -0400 "John C. Pfeifer" pfeifer@umd.edu wrote:
Just to close this thread…
After applying the ITS#8990 patch and making the recommended configuration changes, replication seems to be behaving itself even after a weekend of heavy updates to LDAP.
Thanks for all of the advice.
Hi John,
Glad to hear things are looking better now and good to know that seemed to resolve the issue.
Regards, Quanah
--
Quanah Gibson-Mount Product Architect Symas Corporation Packaged, certified, and supported LDAP solutions powered by OpenLDAP: http://www.symas.com
openldap-technical@openldap.org