Hi,
We currently run a multi-master setup where all 4 of our servers replicate from each other via delta-syncrepl but all our writes are directed at a selected "master" server.
I've noticed recently that we are suffering sync issues and this coincides with us upgrading from 2.4.46 with a patch for ITS8843 to 2.4.47.
On a "consumer" I'm seeing log entries like this:
Jun 5 07:35:03 cauth04.authorise.is.ed.ac.uk slapd[18267]: slap_graduate_commit _csn: removing 0xa1da880 20190605063503.192377Z#000000#003#000000 Jun 5 07:35:03 cauth04.authorise.is.ed.ac.uk slapd[18267]: syncrepl_message_to_ op: rid=034 be_modify cn=STUUG,ou=affiliations,ou=grouper2,dc=authorise,dc=ed,dc =ac,dc=uk (0) Jun 5 07:35:03 cauth04.authorise.is.ed.ac.uk slapd[18267]: slap_queue_csn: queu eing 0x1e963c40 20190605063503.289350Z#000000#003#000000 Jun 5 07:35:03 cauth04.authorise.is.ed.ac.uk slapd[18267]: syncrepl_message_to_ op: rid=035 tid b7ae2700 Jun 5 07:35:03 cauth04.authorise.is.ed.ac.uk slapd[18267]: do_syncrep2: rid=034 cookie=rid=034,sid=003,csn=20190605063503.289350Z#000000#003#000000 Jun 5 07:35:03 cauth04.authorise.is.ed.ac.uk slapd[18267]: syncrepl_null_callba ck : error code 0x14 Jun 5 07:35:03 cauth04.authorise.is.ed.ac.uk slapd[18267]: slap_graduate_commit_csn: removing 0x1e963c40 20190605063503.289350Z#000000#003#000000 Jun 5 07:35:03 cauth04.authorise.is.ed.ac.uk slapd[18267]: syncrepl_message_to_op: rid=035 be_modify cn=PRPHDOPMIH3F_3,ou=PRPHDOPMIH3F,ou=pos,ou=grouper2,dc=authorise,dc=ed,dc=ac,dc=uk (20) Jun 5 07:35:03 cauth04.authorise.is.ed.ac.uk slapd[18267]: do_syncrep2: rid=035 delta-sync lost sync on (reqStart=20190605063503.000020Z,cn=accesslog), switching to REFRESH Jun 5 07:35:03 cauth04.authorise.is.ed.ac.uk slapd[18267]: slap_queue_csn: queueing 0x1d0f8100 20190605063503.289350Z#000000#003#000000 Jun 5 07:35:03 cauth04.authorise.is.ed.ac.uk slapd[18267]: syncrepl_message_to_op: rid=034 tid bacef700 Jun 5 07:35:03 cauth04.authorise.is.ed.ac.uk slapd[18267]: syncrepl_null_callback : error code 0x14 Jun 5 07:35:03 cauth04.authorise.is.ed.ac.uk slapd[18267]: slap_graduate_commit_csn: removing 0x1d0f8100 20190605063503.289350Z#000000#003#000000 Jun 5 07:35:03 cauth04.authorise.is.ed.ac.uk slapd[18267]: syncrepl_message_to_op: rid=034 be_modify cn=PRPHDOPMIH3F_3,ou=PRPHDOPMIH3F,ou=pos,ou=grouper2,dc=authorise,dc=ed,dc=ac,dc=uk (20) Jun 5 07:35:03 cauth04.authorise.is.ed.ac.uk slapd[18267]: do_syncrep2: rid=034 delta-sync lost sync on (reqStart=20190605063503.000414Z,cn=accesslog), switching to REFRESH Jun 5 07:35:03 cauth04.authorise.is.ed.ac.uk slapd[18267]: slap_queue_csn: queueing 0x1b5c87c0 20190605063503.289350Z#000000#003#000000 Jun 5 07:35:03 cauth04.authorise.is.ed.ac.uk slapd[18267]: syncrepl_message_to_op: rid=036 tid b62df700 Jun 5 07:35:03 cauth04.authorise.is.ed.ac.uk slapd[18267]: syncrepl_null_callback : error code 0x14 Jun 5 07:35:03 cauth04.authorise.is.ed.ac.uk slapd[18267]: slap_graduate_commit_csn: removing 0x1b5c87c0 20190605063503.289350Z#000000#003#000000 Jun 5 07:35:03 cauth04.authorise.is.ed.ac.uk slapd[18267]: syncrepl_message_to_op: rid=036 be_modify cn=PRPHDOPMIH3F_3,ou=PRPHDOPMIH3F,ou=pos,ou=grouper2,dc=authorise,dc=ed,dc=ac,dc=uk (20) Jun 5 07:35:03 cauth04.authorise.is.ed.ac.uk slapd[18267]: do_syncrep2: rid=036 delta-sync lost sync on (reqStart=20190605063503.000361Z,cn=accesslog), switching to REFRESH
Then a number of SYNC_ID_SET messages:
Jun 5 07:35:03 cauth04.authorise.is.ed.ac.uk slapd[18267]: do_syncrep2: rid=035 LDAP_RES_INTERMEDIATE - SYNC_ID_SET Jun 5 07:35:03 cauth04.authorise.is.ed.ac.uk slapd[18267]: do_syncrep2: rid=034 LDAP_RES_INTERMEDIATE - SYNC_ID_SET Jun 5 07:35:03 cauth04.authorise.is.ed.ac.uk slapd[18267]: do_syncrep2: rid=036 LDAP_RES_INTERMEDIATE - SYNC_ID_SET Jun 5 07:35:03 cauth04.authorise.is.ed.ac.uk slapd[18267]: do_syncrep2: rid=035 LDAP_RES_INTERMEDIATE - SYNC_ID_SET Jun 5 07:35:03 cauth04.authorise.is.ed.ac.uk slapd[18267]: do_syncrep2: rid=034 LDAP_RES_INTERMEDIATE - SYNC_ID_SET Jun 5 07:35:03 cauth04.authorise.is.ed.ac.uk slapd[18267]: do_syncrep2: rid=036 LDAP_RES_INTERMEDIATE - SYNC_ID_SET Jun 5 07:35:03 cauth04.authorise.is.ed.ac.uk slapd[18267]: do_syncrep2: rid=035 LDAP_RES_INTERMEDIATE - SYNC_ID_SET
Before finally a whole bunch of:
Jun 5 07:35:20 cauth04.authorise.is.ed.ac.uk slapd[18267]: nonpresent_callback: rid=035 present UUID 8573adfa-71c7-4a9c-ab64-54d6c7f51e25, dn dc=authorise,dc=ed,dc=ac,dc=uk Jun 5 07:35:20 cauth04.authorise.is.ed.ac.uk slapd[18267]: nonpresent_callback: rid=035 present UUID 555bdc68-06ee-443c-9b33-6c8355c0a00d, dn ou=is,dc=authorise,dc=ed,dc=ac,dc=uk Jun 5 07:35:20 cauth04.authorise.is.ed.ac.uk slapd[18267]: nonpresent_callback: rid=035 present UUID 8749c8a0-0b06-4dc0-b1bd-ca85f3077603, dn ou=hss,dc=authorise,dc=ed,dc=ac,dc=uk Jun 5 07:35:20 cauth04.authorise.is.ed.ac.uk slapd[18267]: nonpresent_callback: rid=035 present UUID 5d59b0fd-68f1-44b7-a2af-f4aec00c7857, dn ou=mvm,dc=authorise,dc=ed,dc=ac,dc=uk Jun 5 07:35:20 cauth04.authorise.is.ed.ac.uk slapd[18267]: nonpresent_callback: rid=035 present UUID d75a41f2-a279-4cf2-a8e6-033c6821b6b2, dn ou=sce,dc=authorise,dc=ed,dc=ac,dc=uk
(At which point CPU usage on journald and rsyslog understandably go through the roof).
There is a (possibly self-inflicted) point of pain where we have an exattrs=memberOf in our syncrepl config to work around another replication issue so this means that any user objects which are REFRESH'ed end up losing all their group memberships.
I'm mitigating this at present by periodically dumping the data from the "master" and re-generating the databases on the "consumers" but this is more of a sticking plaster approach.
Are there any known bugs/ regressions with delta-syncrepl in 2.4.47? One idea I've had is to go to a true single-master config by setting the current consumers to be read-only and having a single olcSyncrepl clause for the master on these 3 servers.
My config from one of the consumers is listed below although the master config is identical bar the olcSyncrepl lines and olcSaslHost value. I've redacted the schema and olcAccess entries and the root password: If there's any obvious deficiencies in my config too that would be good to know.
Kind regards, Mark
dn: cn=config objectClass: olcGlobal cn: config olcConfigFile: slapd.conf olcConfigDir: slapd.d olcArgsFile: /opt/openldap/var/run/slapd.args olcAttributeOptions: lang- olcAuthzPolicy: none olcAuthzRegexp: {0}"^uid=([^,/@]+)/ldapauth(@(((TEST.)?EASE)|INF).ED.AC.UK)?,cn=(((TEST.)?EASE)|INF).ED.AC.UK,cn=GSSAPI,cn=auth$" "uid=$1,ou=people,ou=central,dc=authorise,dc=ed,dc=ac,dc=uk" olcAuthzRegexp: {1}"^uid=(host|ldap)/([^/]+).authorise.is.ed.ac.uk,cn=EASE.ED.AC.UK,cn=GSSAPI,cn=auth$" "uid=replicator.authorise.is.ed.ac.uk,ou=people,ou=central,dc=authorise,dc=ed,dc=ac,dc=uk" olcAuthzRegexp: {2}"^uid=([^,/@]+)(@(((TEST.)?EASE)|INF).ED.AC.UK)?,cn=(((TEST.)?EASE)|INF).ED.AC.UK,cn=GSSAPI,cn=auth$" "uid=$1,ou=people,ou=central,dc=authorise,dc=ed,dc=ac,dc=uk" olcConcurrency: 0 olcConnMaxPending: 100 olcConnMaxPendingAuth: 1000 olcGentleHUP: FALSE olcIdleTimeout: 0 olcIndexSubstrIfMaxLen: 4 olcIndexSubstrIfMinLen: 2 olcIndexSubstrAnyLen: 4 olcIndexSubstrAnyStep: 2 olcIndexIntLen: 4 olcLocalSSF: 71 olcLogFile: /opt/openldap/var/log/openldap-debug.log olcLogLevel: stats olcLogLevel: sync olcPidFile: /opt/openldap/var/run/slapd.pid olcReadOnly: FALSE olcSaslHost: cauth04.authorise.is.ed.ac.uk olcSaslRealm: EASE.ED.AC.UK olcSaslSecProps: noplain,noanonymous olcSecurity: ssf=1 olcSecurity: update_ssf=112 olcSecurity: simple_bind=64 olcServerID: 3 ldaps://cauth01.authorise.is.ed.ac.uk/ olcServerID: 4 ldaps://cauth02.authorise.is.ed.ac.uk/ olcServerID: 5 ldaps://cauth03.authorise.is.ed.ac.uk/ olcServerID: 6 ldaps://cauth04.authorise.is.ed.ac.uk/ olcSockbufMaxIncoming: 262143 olcSockbufMaxIncomingAuth: 16777215 olcThreads: 12 olcTLSCACertificateFile: /opt/openldap/certs/quovadis.bundle olcTLSCertificateFile: /opt/openldap/certs/authorise.is.ed.ac.uk-QUOVADIS.crt olcTLSCertificateKeyFile: /opt/openldap/certs/authorise.is.ed.ac.uk-QUOVADIS.key olcTLSCipherSuite: HIGH:MEDIUM:+SSLv3:+SSLv2:+TLSv1:+AES256-SHA:!EXPORT olcTLSCRLCheck: none olcTLSVerifyClient: never olcTLSProtocolMin: 3.1 olcToolThreads: 2 olcWriteTimeout: 900
dn: olcDatabase={-1}frontend,cn=config objectClass: olcDatabaseConfig objectClass: olcFrontendConfig olcDatabase: {-1}frontend olcAddContentAcl: FALSE olcLastMod: TRUE olcMaxDerefDepth: 2 olcReadOnly: FALSE olcSchemaDN: cn=Subschema olcSecurity: ssf=1 olcSecurity: update_ssf=112 olcSecurity: simple_bind=64 olcSyncUseSubentry: FALSE olcMonitoring: FALSE olcSortVals: member olcSortVals: eduniServiceCode
dn: olcDatabase={0}config,cn=config objectClass: olcDatabaseConfig olcDatabase: {0}config olcAccess: {0}to * by dn.regex="^uid=replicator.authorise.is.ed.ac.uk,ou=people,ou=central,dc=authorise,dc=ed,dc=ac,dc=uk$" read by * none olcAddContentAcl: TRUE olcLastMod: TRUE olcMaxDerefDepth: 15 olcReadOnly: FALSE olcRootDN: cn=Manager,cn=config olcRootPW:: ************************* olcSyncUseSubentry: FALSE olcMonitoring: FALSE
dn: olcDatabase={1}mdb,cn=config objectClass: olcDatabaseConfig objectClass: olcMdbConfig olcDatabase: {1}mdb olcDbDirectory: /opt/openldap/var/openldap-data/authorise olcSuffix: dc=authorise,dc=ed,dc=ac,dc=uk olcAccess: ************* olcAddContentAcl: FALSE olcLastMod: TRUE olcMaxDerefDepth: 1 olcReadOnly: FALSE olcRootDN: cn=Manager,dc=authorise,dc=ed,dc=ac,dc=uk olcRootPW:: ********************* olcSecurity: ssf=1 olcSecurity: update_ssf=112 olcSecurity: simple_bind=64 olcSizeLimit: unlimited olcSyncUseSubentry: FALSE olcTimeLimit: unlimited olcMirrorMode: TRUE olcMonitoring: TRUE olcDbCheckpoint: 1024 10 olcDbEnvFlags: writemap olcDbEnvFlags: nometasync olcDbNoSync: FALSE olcDbIndex: objectClass eq olcDbIndex: entryUUID eq olcDbIndex: entryCSN eq olcDbIndex: cn pres,eq,sub olcDbIndex: uid pres,eq,sub olcDbIndex: uidNumber pres,eq olcDbIndex: gidNumber pres,eq olcDbIndex: eduniType eq olcDbIndex: gecos pres,eq,sub olcDbIndex: eduniCategory eq olcDbIndex: mail pres,eq,sub olcDbIndex: eduniSchoolCode eq olcDbIndex: eduniIDStatus eq olcDbIndex: eduniCollegeCode eq olcDbIndex: eduniUnitCode eq olcDbIndex: eduniOrgCode eq olcDbIndex: eduniLibraryBarcode pres,eq olcDbIndex: eduniServiceCode pres,eq olcDbIndex: krbName pres,eq olcDbIndex: eduPersonAffiliation pres,eq olcDbIndex: eduPersonPrimaryAffiliation eq olcDbIndex: eduPersonPrincipalName eq olcDbIndex: eduPersonEntitlement pres,eq olcDbIndex: eduPersonTargetedID eq olcDbIndex: eduPersonScopedAffiliation pres,eq olcDbIndex: userPassword eq olcDbIndex: eduniIdmsId pres,eq olcDbIndex: sn pres,eq,sub olcDbIndex: memberUid eq olcDbIndex: uniqueMember pres,eq olcDbIndex: autoMountKey eq olcDbIndex: eduniOrganisation pres,eq,sub olcDbIndex: description pres,eq,sub olcDbIndex: eduniTitle pres,eq olcDbIndex: title pres,eq,sub olcDbIndex: eduniCardNumber pres,eq olcDbIndex: eduniRefNo pres,eq olcDbIndex: givenName pres,eq,sub olcDbIndex: member eq olcDbIndex: eduniYearOfStudy pres,eq olcDbIndex: yubiKeyId pres,eq olcDbIndex: nisMapName eq olcDbIndex: memberOf eq olcDbIndex: proxyAddresses pres,eq,sub olcDbMaxReaders: 96 olcDbMaxSize: 32212254720 olcDbMode: 0600 olcDbSearchStack: 16 olcSyncrepl: {0}rid=34 provider=ldaps://cauth01.authorise.is.ed.ac.uk/ realm=EASE.ED.AC.UK bindmethod=sasl saslmech=gssapi authcid="replicator.authorise.is.ed.ac.uk@EASE.ED.AC.UK" searchbase="dc=authorise,dc=ed,dc=ac,dc=uk" logbase="cn=accesslog" logfilter="(&(objectClass=auditWriteObject)(reqResult=0))" schemachecking=on type=refreshAndPersist retry="60 +" tls_reqcert=never timeout=1 keepalive=300:12:5 syncdata=accesslog exattrs="memberOf" olcSyncrepl: {1}rid=35 provider=ldaps://cauth02.authorise.is.ed.ac.uk/ realm=EASE.ED.AC.UK bindmethod=sasl saslmech=gssapi authcid="replicator.authorise.is.ed.ac.uk@EASE.ED.AC.UK" searchbase="dc=authorise,dc=ed,dc=ac,dc=uk" logbase="cn=accesslog" logfilter="(&(objectClass=auditWriteObject)(reqResult=0))" schemachecking=on type=refreshAndPersist retry="60 +" tls_reqcert=never timeout=1 keepalive=300:12:5 syncdata=accesslog exattrs="memberOf" olcSyncrepl: {2}rid=36 provider=ldaps://cauth03.authorise.is.ed.ac.uk/ realm=EASE.ED.AC.UK bindmethod=sasl saslmech=gssapi authcid="replicator.authorise.is.ed.ac.uk@EASE.ED.AC.UK" searchbase="dc=authorise,dc=ed,dc=ac,dc=uk" logbase="cn=accesslog" logfilter="(&(objectClass=auditWriteObject)(reqResult=0))" schemachecking=on type=refreshAndPersist retry="60 +" tls_reqcert=never timeout=1 keepalive=300:12:5 syncdata=accesslog exattrs="memberOf" entryCSN: 20190528133331.277387Z#000000#006#000000
dn: olcOverlay={0}dynlist,olcDatabase={1}mdb,cn=config objectClass: olcOverlayConfig objectClass: olcDynamicList olcOverlay: {0}dynlist olcDlAttrSet: {0}groupOfURLs memberURL
dn: olcOverlay={1}memberof,olcDatabase={1}mdb,cn=config objectClass: olcOverlayConfig objectClass: olcMemberOf olcOverlay: {1}memberof olcMemberOfDangling: ignore olcMemberOfRefInt: TRUE olcMemberOfGroupOC: groupOfNames olcMemberOfMemberAD: member olcMemberOfMemberOfAD: memberOf
dn: olcOverlay={2}syncprov,olcDatabase={1}mdb,cn=config objectClass: olcOverlayConfig objectClass: olcConfig objectClass: top objectClass: olcSyncProvConfig olcOverlay: {2}syncprov
dn: olcOverlay={3}accesslog,olcDatabase={1}mdb,cn=config objectClass: olcOverlayConfig objectClass: olcAccessLogConfig olcOverlay: {3}accesslog olcAccessLogDB: cn=accesslog olcAccessLogOps: writes olcAccessLogPurge: 02+00:00 00+04:00 olcAccessLogSuccess: TRUE
dn: olcDatabase={2}mdb,cn=config objectClass: olcDatabaseConfig objectClass: olcMdbConfig olcDatabase: {2}mdb olcDbDirectory: /opt/openldap/var/openldap-data/accesslog olcSuffix: cn=accesslog olcAccess: {0}to * by dn.exact="uid=replicator.authorise.is.ed.ac.uk,ou=people,ou=central,dc=authorise,dc=ed,dc=ac,dc=uk" write by dn="cn=Manager,dc=authorise,dc=ed,dc=ac,dc=uk" write olcLimits: {0}dn.exact="uid=replicator.authorise.is.ed.ac.uk,ou=people,ou=central,dc=authorise,dc=ed,dc=ac,dc=uk" time.soft=unlimited time.hard=unlimited size.soft=unlimited size.hard=unlimited olcLimits: {1}dn.exact="cn=Manager,dc=authorise,dc=ed,dc=ac,dc=uk" time.soft=unlimited time.hard=unlimited size.soft=unlimited size.hard=unlimited olcRootDN: cn=Manager,cn=accesslog olcRootPW:: ******************************** olcDbIndex: default eq olcDbIndex: entryCSN,objectClass,reqEnd,reqResult,reqStart,reqDN olcDbMaxReaders: 96 olcDbMaxSize: 32212254720 olcDbMode: 0600 olcDbSearchStack: 16
dn: olcOverlay={0}syncprov,olcDatabase={2}mdb,cn=config objectClass: olcOverlayConfig objectClass: olcConfig objectClass: top objectClass: olcSyncProvConfig olcOverlay: {0}syncprov olcSpNoPresent: TRUE olcSpReloadHint: TRUE
dn: olcDatabase={3}monitor,cn=config objectClass: olcMonitorConfig objectClass: olcDatabaseConfig objectClass: olcConfig objectClass: top olcDatabase: {3}monitor olcAccess: {0}to dn.subtree="cn=monitor" by dn.exact="cn=Manager,cn=config" write by * none