Hi,
We currently run a multi-master setup where all 4 of our servers replicate from each other via delta-syncrepl but all our writes are directed at a selected "master" server.
I've noticed recently that we are suffering sync issues and this coincides with us upgrading from 2.4.46 with a patch for ITS8843 to 2.4.47.
On a "consumer" I'm seeing log entries like this:
Jun 5 07:35:03 cauth04.authorise.is.ed.ac.uk slapd[18267]: slap_graduate_commit _csn: removing 0xa1da880 20190605063503.192377Z#000000#003#000000 Jun 5 07:35:03 cauth04.authorise.is.ed.ac.uk slapd[18267]: syncrepl_message_to_ op: rid=034 be_modify cn=STUUG,ou=affiliations,ou=grouper2,dc=authorise,dc=ed,dc =ac,dc=uk (0) Jun 5 07:35:03 cauth04.authorise.is.ed.ac.uk slapd[18267]: slap_queue_csn: queu eing 0x1e963c40 20190605063503.289350Z#000000#003#000000 Jun 5 07:35:03 cauth04.authorise.is.ed.ac.uk slapd[18267]: syncrepl_message_to_ op: rid=035 tid b7ae2700 Jun 5 07:35:03 cauth04.authorise.is.ed.ac.uk slapd[18267]: do_syncrep2: rid=034 cookie=rid=034,sid=003,csn=20190605063503.289350Z#000000#003#000000 Jun 5 07:35:03 cauth04.authorise.is.ed.ac.uk slapd[18267]: syncrepl_null_callba ck : error code 0x14 Jun 5 07:35:03 cauth04.authorise.is.ed.ac.uk slapd[18267]: slap_graduate_commit_csn: removing 0x1e963c40 20190605063503.289350Z#000000#003#000000 Jun 5 07:35:03 cauth04.authorise.is.ed.ac.uk slapd[18267]: syncrepl_message_to_op: rid=035 be_modify cn=PRPHDOPMIH3F_3,ou=PRPHDOPMIH3F,ou=pos,ou=grouper2,dc=authorise,dc=ed,dc=ac,dc=uk (20) Jun 5 07:35:03 cauth04.authorise.is.ed.ac.uk slapd[18267]: do_syncrep2: rid=035 delta-sync lost sync on (reqStart=20190605063503.000020Z,cn=accesslog), switching to REFRESH Jun 5 07:35:03 cauth04.authorise.is.ed.ac.uk slapd[18267]: slap_queue_csn: queueing 0x1d0f8100 20190605063503.289350Z#000000#003#000000 Jun 5 07:35:03 cauth04.authorise.is.ed.ac.uk slapd[18267]: syncrepl_message_to_op: rid=034 tid bacef700 Jun 5 07:35:03 cauth04.authorise.is.ed.ac.uk slapd[18267]: syncrepl_null_callback : error code 0x14 Jun 5 07:35:03 cauth04.authorise.is.ed.ac.uk slapd[18267]: slap_graduate_commit_csn: removing 0x1d0f8100 20190605063503.289350Z#000000#003#000000 Jun 5 07:35:03 cauth04.authorise.is.ed.ac.uk slapd[18267]: syncrepl_message_to_op: rid=034 be_modify cn=PRPHDOPMIH3F_3,ou=PRPHDOPMIH3F,ou=pos,ou=grouper2,dc=authorise,dc=ed,dc=ac,dc=uk (20) Jun 5 07:35:03 cauth04.authorise.is.ed.ac.uk slapd[18267]: do_syncrep2: rid=034 delta-sync lost sync on (reqStart=20190605063503.000414Z,cn=accesslog), switching to REFRESH Jun 5 07:35:03 cauth04.authorise.is.ed.ac.uk slapd[18267]: slap_queue_csn: queueing 0x1b5c87c0 20190605063503.289350Z#000000#003#000000 Jun 5 07:35:03 cauth04.authorise.is.ed.ac.uk slapd[18267]: syncrepl_message_to_op: rid=036 tid b62df700 Jun 5 07:35:03 cauth04.authorise.is.ed.ac.uk slapd[18267]: syncrepl_null_callback : error code 0x14 Jun 5 07:35:03 cauth04.authorise.is.ed.ac.uk slapd[18267]: slap_graduate_commit_csn: removing 0x1b5c87c0 20190605063503.289350Z#000000#003#000000 Jun 5 07:35:03 cauth04.authorise.is.ed.ac.uk slapd[18267]: syncrepl_message_to_op: rid=036 be_modify cn=PRPHDOPMIH3F_3,ou=PRPHDOPMIH3F,ou=pos,ou=grouper2,dc=authorise,dc=ed,dc=ac,dc=uk (20) Jun 5 07:35:03 cauth04.authorise.is.ed.ac.uk slapd[18267]: do_syncrep2: rid=036 delta-sync lost sync on (reqStart=20190605063503.000361Z,cn=accesslog), switching to REFRESH
Then a number of SYNC_ID_SET messages:
Jun 5 07:35:03 cauth04.authorise.is.ed.ac.uk slapd[18267]: do_syncrep2: rid=035 LDAP_RES_INTERMEDIATE - SYNC_ID_SET Jun 5 07:35:03 cauth04.authorise.is.ed.ac.uk slapd[18267]: do_syncrep2: rid=034 LDAP_RES_INTERMEDIATE - SYNC_ID_SET Jun 5 07:35:03 cauth04.authorise.is.ed.ac.uk slapd[18267]: do_syncrep2: rid=036 LDAP_RES_INTERMEDIATE - SYNC_ID_SET Jun 5 07:35:03 cauth04.authorise.is.ed.ac.uk slapd[18267]: do_syncrep2: rid=035 LDAP_RES_INTERMEDIATE - SYNC_ID_SET Jun 5 07:35:03 cauth04.authorise.is.ed.ac.uk slapd[18267]: do_syncrep2: rid=034 LDAP_RES_INTERMEDIATE - SYNC_ID_SET Jun 5 07:35:03 cauth04.authorise.is.ed.ac.uk slapd[18267]: do_syncrep2: rid=036 LDAP_RES_INTERMEDIATE - SYNC_ID_SET Jun 5 07:35:03 cauth04.authorise.is.ed.ac.uk slapd[18267]: do_syncrep2: rid=035 LDAP_RES_INTERMEDIATE - SYNC_ID_SET
Before finally a whole bunch of:
Jun 5 07:35:20 cauth04.authorise.is.ed.ac.uk slapd[18267]: nonpresent_callback: rid=035 present UUID 8573adfa-71c7-4a9c-ab64-54d6c7f51e25, dn dc=authorise,dc=ed,dc=ac,dc=uk Jun 5 07:35:20 cauth04.authorise.is.ed.ac.uk slapd[18267]: nonpresent_callback: rid=035 present UUID 555bdc68-06ee-443c-9b33-6c8355c0a00d, dn ou=is,dc=authorise,dc=ed,dc=ac,dc=uk Jun 5 07:35:20 cauth04.authorise.is.ed.ac.uk slapd[18267]: nonpresent_callback: rid=035 present UUID 8749c8a0-0b06-4dc0-b1bd-ca85f3077603, dn ou=hss,dc=authorise,dc=ed,dc=ac,dc=uk Jun 5 07:35:20 cauth04.authorise.is.ed.ac.uk slapd[18267]: nonpresent_callback: rid=035 present UUID 5d59b0fd-68f1-44b7-a2af-f4aec00c7857, dn ou=mvm,dc=authorise,dc=ed,dc=ac,dc=uk Jun 5 07:35:20 cauth04.authorise.is.ed.ac.uk slapd[18267]: nonpresent_callback: rid=035 present UUID d75a41f2-a279-4cf2-a8e6-033c6821b6b2, dn ou=sce,dc=authorise,dc=ed,dc=ac,dc=uk
(At which point CPU usage on journald and rsyslog understandably go through the roof).
There is a (possibly self-inflicted) point of pain where we have an exattrs=memberOf in our syncrepl config to work around another replication issue so this means that any user objects which are REFRESH'ed end up losing all their group memberships.
I'm mitigating this at present by periodically dumping the data from the "master" and re-generating the databases on the "consumers" but this is more of a sticking plaster approach.
Are there any known bugs/ regressions with delta-syncrepl in 2.4.47? One idea I've had is to go to a true single-master config by setting the current consumers to be read-only and having a single olcSyncrepl clause for the master on these 3 servers.
My config from one of the consumers is listed below although the master config is identical bar the olcSyncrepl lines and olcSaslHost value. I've redacted the schema and olcAccess entries and the root password: If there's any obvious deficiencies in my config too that would be good to know.
Kind regards, Mark
dn: cn=config objectClass: olcGlobal cn: config olcConfigFile: slapd.conf olcConfigDir: slapd.d olcArgsFile: /opt/openldap/var/run/slapd.args olcAttributeOptions: lang- olcAuthzPolicy: none olcAuthzRegexp: {0}"^uid=([^,/@]+)/ldapauth(@(((TEST.)?EASE)|INF).ED.AC.UK)?,cn=(((TEST.)?EASE)|INF).ED.AC.UK,cn=GSSAPI,cn=auth$" "uid=$1,ou=people,ou=central,dc=authorise,dc=ed,dc=ac,dc=uk" olcAuthzRegexp: {1}"^uid=(host|ldap)/([^/]+).authorise.is.ed.ac.uk,cn=EASE.ED.AC.UK,cn=GSSAPI,cn=auth$" "uid=replicator.authorise.is.ed.ac.uk,ou=people,ou=central,dc=authorise,dc=ed,dc=ac,dc=uk" olcAuthzRegexp: {2}"^uid=([^,/@]+)(@(((TEST.)?EASE)|INF).ED.AC.UK)?,cn=(((TEST.)?EASE)|INF).ED.AC.UK,cn=GSSAPI,cn=auth$" "uid=$1,ou=people,ou=central,dc=authorise,dc=ed,dc=ac,dc=uk" olcConcurrency: 0 olcConnMaxPending: 100 olcConnMaxPendingAuth: 1000 olcGentleHUP: FALSE olcIdleTimeout: 0 olcIndexSubstrIfMaxLen: 4 olcIndexSubstrIfMinLen: 2 olcIndexSubstrAnyLen: 4 olcIndexSubstrAnyStep: 2 olcIndexIntLen: 4 olcLocalSSF: 71 olcLogFile: /opt/openldap/var/log/openldap-debug.log olcLogLevel: stats olcLogLevel: sync olcPidFile: /opt/openldap/var/run/slapd.pid olcReadOnly: FALSE olcSaslHost: cauth04.authorise.is.ed.ac.uk olcSaslRealm: EASE.ED.AC.UK olcSaslSecProps: noplain,noanonymous olcSecurity: ssf=1 olcSecurity: update_ssf=112 olcSecurity: simple_bind=64 olcServerID: 3 ldaps://cauth01.authorise.is.ed.ac.uk/ olcServerID: 4 ldaps://cauth02.authorise.is.ed.ac.uk/ olcServerID: 5 ldaps://cauth03.authorise.is.ed.ac.uk/ olcServerID: 6 ldaps://cauth04.authorise.is.ed.ac.uk/ olcSockbufMaxIncoming: 262143 olcSockbufMaxIncomingAuth: 16777215 olcThreads: 12 olcTLSCACertificateFile: /opt/openldap/certs/quovadis.bundle olcTLSCertificateFile: /opt/openldap/certs/authorise.is.ed.ac.uk-QUOVADIS.crt olcTLSCertificateKeyFile: /opt/openldap/certs/authorise.is.ed.ac.uk-QUOVADIS.key olcTLSCipherSuite: HIGH:MEDIUM:+SSLv3:+SSLv2:+TLSv1:+AES256-SHA:!EXPORT olcTLSCRLCheck: none olcTLSVerifyClient: never olcTLSProtocolMin: 3.1 olcToolThreads: 2 olcWriteTimeout: 900
dn: olcDatabase={-1}frontend,cn=config objectClass: olcDatabaseConfig objectClass: olcFrontendConfig olcDatabase: {-1}frontend olcAddContentAcl: FALSE olcLastMod: TRUE olcMaxDerefDepth: 2 olcReadOnly: FALSE olcSchemaDN: cn=Subschema olcSecurity: ssf=1 olcSecurity: update_ssf=112 olcSecurity: simple_bind=64 olcSyncUseSubentry: FALSE olcMonitoring: FALSE olcSortVals: member olcSortVals: eduniServiceCode
dn: olcDatabase={0}config,cn=config objectClass: olcDatabaseConfig olcDatabase: {0}config olcAccess: {0}to * by dn.regex="^uid=replicator.authorise.is.ed.ac.uk,ou=people,ou=central,dc=authorise,dc=ed,dc=ac,dc=uk$" read by * none olcAddContentAcl: TRUE olcLastMod: TRUE olcMaxDerefDepth: 15 olcReadOnly: FALSE olcRootDN: cn=Manager,cn=config olcRootPW:: ************************* olcSyncUseSubentry: FALSE olcMonitoring: FALSE
dn: olcDatabase={1}mdb,cn=config objectClass: olcDatabaseConfig objectClass: olcMdbConfig olcDatabase: {1}mdb olcDbDirectory: /opt/openldap/var/openldap-data/authorise olcSuffix: dc=authorise,dc=ed,dc=ac,dc=uk olcAccess: ************* olcAddContentAcl: FALSE olcLastMod: TRUE olcMaxDerefDepth: 1 olcReadOnly: FALSE olcRootDN: cn=Manager,dc=authorise,dc=ed,dc=ac,dc=uk olcRootPW:: ********************* olcSecurity: ssf=1 olcSecurity: update_ssf=112 olcSecurity: simple_bind=64 olcSizeLimit: unlimited olcSyncUseSubentry: FALSE olcTimeLimit: unlimited olcMirrorMode: TRUE olcMonitoring: TRUE olcDbCheckpoint: 1024 10 olcDbEnvFlags: writemap olcDbEnvFlags: nometasync olcDbNoSync: FALSE olcDbIndex: objectClass eq olcDbIndex: entryUUID eq olcDbIndex: entryCSN eq olcDbIndex: cn pres,eq,sub olcDbIndex: uid pres,eq,sub olcDbIndex: uidNumber pres,eq olcDbIndex: gidNumber pres,eq olcDbIndex: eduniType eq olcDbIndex: gecos pres,eq,sub olcDbIndex: eduniCategory eq olcDbIndex: mail pres,eq,sub olcDbIndex: eduniSchoolCode eq olcDbIndex: eduniIDStatus eq olcDbIndex: eduniCollegeCode eq olcDbIndex: eduniUnitCode eq olcDbIndex: eduniOrgCode eq olcDbIndex: eduniLibraryBarcode pres,eq olcDbIndex: eduniServiceCode pres,eq olcDbIndex: krbName pres,eq olcDbIndex: eduPersonAffiliation pres,eq olcDbIndex: eduPersonPrimaryAffiliation eq olcDbIndex: eduPersonPrincipalName eq olcDbIndex: eduPersonEntitlement pres,eq olcDbIndex: eduPersonTargetedID eq olcDbIndex: eduPersonScopedAffiliation pres,eq olcDbIndex: userPassword eq olcDbIndex: eduniIdmsId pres,eq olcDbIndex: sn pres,eq,sub olcDbIndex: memberUid eq olcDbIndex: uniqueMember pres,eq olcDbIndex: autoMountKey eq olcDbIndex: eduniOrganisation pres,eq,sub olcDbIndex: description pres,eq,sub olcDbIndex: eduniTitle pres,eq olcDbIndex: title pres,eq,sub olcDbIndex: eduniCardNumber pres,eq olcDbIndex: eduniRefNo pres,eq olcDbIndex: givenName pres,eq,sub olcDbIndex: member eq olcDbIndex: eduniYearOfStudy pres,eq olcDbIndex: yubiKeyId pres,eq olcDbIndex: nisMapName eq olcDbIndex: memberOf eq olcDbIndex: proxyAddresses pres,eq,sub olcDbMaxReaders: 96 olcDbMaxSize: 32212254720 olcDbMode: 0600 olcDbSearchStack: 16 olcSyncrepl: {0}rid=34 provider=ldaps://cauth01.authorise.is.ed.ac.uk/ realm=EASE.ED.AC.UK bindmethod=sasl saslmech=gssapi authcid="replicator.authorise.is.ed.ac.uk@EASE.ED.AC.UK" searchbase="dc=authorise,dc=ed,dc=ac,dc=uk" logbase="cn=accesslog" logfilter="(&(objectClass=auditWriteObject)(reqResult=0))" schemachecking=on type=refreshAndPersist retry="60 +" tls_reqcert=never timeout=1 keepalive=300:12:5 syncdata=accesslog exattrs="memberOf" olcSyncrepl: {1}rid=35 provider=ldaps://cauth02.authorise.is.ed.ac.uk/ realm=EASE.ED.AC.UK bindmethod=sasl saslmech=gssapi authcid="replicator.authorise.is.ed.ac.uk@EASE.ED.AC.UK" searchbase="dc=authorise,dc=ed,dc=ac,dc=uk" logbase="cn=accesslog" logfilter="(&(objectClass=auditWriteObject)(reqResult=0))" schemachecking=on type=refreshAndPersist retry="60 +" tls_reqcert=never timeout=1 keepalive=300:12:5 syncdata=accesslog exattrs="memberOf" olcSyncrepl: {2}rid=36 provider=ldaps://cauth03.authorise.is.ed.ac.uk/ realm=EASE.ED.AC.UK bindmethod=sasl saslmech=gssapi authcid="replicator.authorise.is.ed.ac.uk@EASE.ED.AC.UK" searchbase="dc=authorise,dc=ed,dc=ac,dc=uk" logbase="cn=accesslog" logfilter="(&(objectClass=auditWriteObject)(reqResult=0))" schemachecking=on type=refreshAndPersist retry="60 +" tls_reqcert=never timeout=1 keepalive=300:12:5 syncdata=accesslog exattrs="memberOf" entryCSN: 20190528133331.277387Z#000000#006#000000
dn: olcOverlay={0}dynlist,olcDatabase={1}mdb,cn=config objectClass: olcOverlayConfig objectClass: olcDynamicList olcOverlay: {0}dynlist olcDlAttrSet: {0}groupOfURLs memberURL
dn: olcOverlay={1}memberof,olcDatabase={1}mdb,cn=config objectClass: olcOverlayConfig objectClass: olcMemberOf olcOverlay: {1}memberof olcMemberOfDangling: ignore olcMemberOfRefInt: TRUE olcMemberOfGroupOC: groupOfNames olcMemberOfMemberAD: member olcMemberOfMemberOfAD: memberOf
dn: olcOverlay={2}syncprov,olcDatabase={1}mdb,cn=config objectClass: olcOverlayConfig objectClass: olcConfig objectClass: top objectClass: olcSyncProvConfig olcOverlay: {2}syncprov
dn: olcOverlay={3}accesslog,olcDatabase={1}mdb,cn=config objectClass: olcOverlayConfig objectClass: olcAccessLogConfig olcOverlay: {3}accesslog olcAccessLogDB: cn=accesslog olcAccessLogOps: writes olcAccessLogPurge: 02+00:00 00+04:00 olcAccessLogSuccess: TRUE
dn: olcDatabase={2}mdb,cn=config objectClass: olcDatabaseConfig objectClass: olcMdbConfig olcDatabase: {2}mdb olcDbDirectory: /opt/openldap/var/openldap-data/accesslog olcSuffix: cn=accesslog olcAccess: {0}to * by dn.exact="uid=replicator.authorise.is.ed.ac.uk,ou=people,ou=central,dc=authorise,dc=ed,dc=ac,dc=uk" write by dn="cn=Manager,dc=authorise,dc=ed,dc=ac,dc=uk" write olcLimits: {0}dn.exact="uid=replicator.authorise.is.ed.ac.uk,ou=people,ou=central,dc=authorise,dc=ed,dc=ac,dc=uk" time.soft=unlimited time.hard=unlimited size.soft=unlimited size.hard=unlimited olcLimits: {1}dn.exact="cn=Manager,dc=authorise,dc=ed,dc=ac,dc=uk" time.soft=unlimited time.hard=unlimited size.soft=unlimited size.hard=unlimited olcRootDN: cn=Manager,cn=accesslog olcRootPW:: ******************************** olcDbIndex: default eq olcDbIndex: entryCSN,objectClass,reqEnd,reqResult,reqStart,reqDN olcDbMaxReaders: 96 olcDbMaxSize: 32212254720 olcDbMode: 0600 olcDbSearchStack: 16
dn: olcOverlay={0}syncprov,olcDatabase={2}mdb,cn=config objectClass: olcOverlayConfig objectClass: olcConfig objectClass: top objectClass: olcSyncProvConfig olcOverlay: {0}syncprov olcSpNoPresent: TRUE olcSpReloadHint: TRUE
dn: olcDatabase={3}monitor,cn=config objectClass: olcMonitorConfig objectClass: olcDatabaseConfig objectClass: olcConfig objectClass: top olcDatabase: {3}monitor olcAccess: {0}to dn.subtree="cn=monitor" by dn.exact="cn=Manager,cn=config" write by * none
--On Wednesday, June 05, 2019 11:38 AM +0100 Mark Cairney Mark.Cairney@ed.ac.uk wrote:
Hi,
We currently run a multi-master setup where all 4 of our servers replicate from each other via delta-syncrepl but all our writes are directed at a selected "master" server.
I've noticed recently that we are suffering sync issues and this coincides with us upgrading from 2.4.46 with a patch for ITS8843 to 2.4.47.
Hi Mark,
Error 0x14 would be attribute type or value already exists. This would indicate a fundamental problem of there being discrepencies between your servers. I.e., your databases are out of sync.
There is a (possibly self-inflicted) point of pain where we have an exattrs=memberOf in our syncrepl config to work around another replication issue so this means that any user objects which are REFRESH'ed end up losing all their group memberships.
You are aware the slapo-memberof(5) man page specifically states it is not compatible with syncrepl based replication, in particular because of the way in which the REFRESH phase functions combined with user/group entry location (creation time) in the database?
I would add that the exattrs line shouldn't be necessary with a proper configuration. I'm not sure what issue you were trying to avoid with this setting. The ITS#8444 regression test in the testsuite specifically has a 4-way MMR setup with memberof, and requires no such setting nor does it exhibit the issues you mention.
I.e., the only way you can ensure slapo-memberof will be "ok" in an replicated environment (syncrepl or delta-syncrepl) is if you can guarantee a REFRESH will never occur.
As for the configuration of the server, see below.
Are there any known bugs/ regressions with delta-syncrepl in 2.4.47? One idea I've had is to go to a true single-master config by setting the current consumers to be read-only and having a single olcSyncrepl clause for the master on these 3 servers.
dn: olcOverlay={0}dynlist,olcDatabase={1}mdb,cn=config objectClass: olcOverlayConfig objectClass: olcDynamicList olcOverlay: {0}dynlist olcDlAttrSet: {0}groupOfURLs memberURL
dn: olcOverlay={1}memberof,olcDatabase={1}mdb,cn=config objectClass: olcOverlayConfig objectClass: olcMemberOf olcOverlay: {1}memberof olcMemberOfDangling: ignore olcMemberOfRefInt: TRUE olcMemberOfGroupOC: groupOfNames olcMemberOfMemberAD: member olcMemberOfMemberOfAD: memberOf
dn: olcOverlay={2}syncprov,olcDatabase={1}mdb,cn=config objectClass: olcOverlayConfig objectClass: olcConfig objectClass: top objectClass: olcSyncProvConfig olcOverlay: {2}syncprov
dn: olcOverlay={3}accesslog,olcDatabase={1}mdb,cn=config objectClass: olcOverlayConfig objectClass: olcAccessLogConfig olcOverlay: {3}accesslog olcAccessLogDB: cn=accesslog olcAccessLogOps: writes olcAccessLogPurge: 02+00:00 00+04:00 olcAccessLogSuccess: TRUE
As I've noted a number of times on the list, overlay instantiation order is important for operation interception/processing. The syncprov overlay should be the first instantiated overlay, followed by accesslog, in a delta-syncrepl setup. In the above, this is clearly not the case.
I would additionally note that the syncprov overlay is missing a sessionlog setting, where the default is likely much smaller than required for mitigating ITS#8125.
Hope that helps!
--Quanah
--
Quanah Gibson-Mount Product Architect Symas Corporation Packaged, certified, and supported LDAP solutions powered by OpenLDAP: http://www.symas.com
Hi Quanah,
On 05/06/2019 19:09, Quanah Gibson-Mount wrote:
--On Wednesday, June 05, 2019 11:38 AM +0100 Mark Cairney Mark.Cairney@ed.ac.uk wrote:
Hi,
We currently run a multi-master setup where all 4 of our servers replicate from each other via delta-syncrepl but all our writes are directed at a selected "master" server.
I've noticed recently that we are suffering sync issues and this coincides with us upgrading from 2.4.46 with a patch for ITS8843 to 2.4.47.
Hi Mark,
Error 0x14 would be attribute type or value already exists. This would indicate a fundamental problem of there being discrepencies between your servers. I.e., your databases are out of sync.
That smells a little bit like NTP sync but these boxes are all set to aggressively poll a single NTP server:
server 129.215.205.191 minpoll 4 maxpoll 6 restrict 129.215.205.191 nomodify notrap
There is a (possibly self-inflicted) point of pain where we have an exattrs=memberOf in our syncrepl config to work around another replication issue so this means that any user objects which are REFRESH'ed end up losing all their group memberships.
You are aware the slapo-memberof(5) man page specifically states it is not compatible with syncrepl based replication, in particular because of the way in which the REFRESH phase functions combined with user/group entry location (creation time) in the database?
I would add that the exattrs line shouldn't be necessary with a proper configuration. I'm not sure what issue you were trying to avoid with this setting. The ITS#8444 regression test in the testsuite specifically has a 4-way MMR setup with memberof, and requires no such setting nor does it exhibit the issues you mention.
I.e., the only way you can ensure slapo-memberof will be "ok" in an replicated environment (syncrepl or delta-syncrepl) is if you can guarantee a REFRESH will never occur.
I think this was done in part to mitigate the behaviour in that ITS. If memberOf isn't compatible with syncrepl then does that mean you can't have replication and use the memberOf overlay? This would pose a problem for many medium-large installations surely? Looking back through the mailing list archive I do note a discussion of the whys and wherefores in September 2018.
The man page suggests using the dynlist overlay instead but given my limited understanding I don't see how that would work given the most common use case of the memberOf overlay is to get the group memberships when the user object is queried e.g. ldapsearch "uid=mcairney" "*" memberof
In general though it does seem like replication and attributes maintained locally via overlays (memberOf, ppolicy etc) are an absolute minefield!
As for the configuration of the server, see below.
Are there any known bugs/ regressions with delta-syncrepl in 2.4.47? One idea I've had is to go to a true single-master config by setting the current consumers to be read-only and having a single olcSyncrepl clause for the master on these 3 servers.
dn: olcOverlay={0}dynlist,olcDatabase={1}mdb,cn=config objectClass: olcOverlayConfig objectClass: olcDynamicList olcOverlay: {0}dynlist olcDlAttrSet: {0}groupOfURLs memberURL
dn: olcOverlay={1}memberof,olcDatabase={1}mdb,cn=config objectClass: olcOverlayConfig objectClass: olcMemberOf olcOverlay: {1}memberof olcMemberOfDangling: ignore olcMemberOfRefInt: TRUE olcMemberOfGroupOC: groupOfNames olcMemberOfMemberAD: member olcMemberOfMemberOfAD: memberOf
dn: olcOverlay={2}syncprov,olcDatabase={1}mdb,cn=config objectClass: olcOverlayConfig objectClass: olcConfig objectClass: top objectClass: olcSyncProvConfig olcOverlay: {2}syncprov
dn: olcOverlay={3}accesslog,olcDatabase={1}mdb,cn=config objectClass: olcOverlayConfig objectClass: olcAccessLogConfig olcOverlay: {3}accesslog olcAccessLogDB: cn=accesslog olcAccessLogOps: writes olcAccessLogPurge: 02+00:00 00+04:00 olcAccessLogSuccess: TRUE
As I've noted a number of times on the list, overlay instantiation order is important for operation interception/processing. The syncprov overlay should be the first instantiated overlay, followed by accesslog, in a delta-syncrepl setup. In the above, this is clearly not the case.
I thought I'd seen this mentioned but wasn't 100% sure. I've now re-ordered my overlays as follows:
dn: olcDatabase={1}mdb,cn=config
# {0}syncprov, {1}mdb, config dn: olcOverlay={0}syncprov,olcDatabase={1}mdb,cn=config
# {1}accesslog, {1}mdb, config dn: olcOverlay={1}accesslog,olcDatabase={1}mdb,cn=config
# {2}memberof, {1}mdb, config dn: olcOverlay={2}memberof,olcDatabase={1}mdb,cn=config
# {3}dynlist, {1}mdb, config dn: olcOverlay={3}dynlist,olcDatabase={1}mdb,cn=config
I would additionally note that the syncprov overlay is missing a sessionlog setting, where the default is likely much smaller than required for mitigating ITS#8125.
...and I've set olcSpSessionlog to be 500 (I'm not sure what the default is- 100?) Hopefully that's sufficient to avoid ITS#8125
Hope that helps!
It does! Many thanks,
Mark
--Quanah
--
Quanah Gibson-Mount Product Architect Symas Corporation Packaged, certified, and supported LDAP solutions powered by OpenLDAP: http://www.symas.com
Mark Cairney Mark.Cairney@ed.ac.uk schrieb am 06.06.2019 um 13:12 in
Nachricht c2ecf5a2-94bc-3ff0-75df-8142da2944a8@ed.ac.uk:
Hi Quanah,
On 05/06/2019 19:09, Quanah Gibson-Mount wrote:
--On Wednesday, June 05, 2019 11:38 AM +0100 Mark Cairney Mark.Cairney@ed.ac.uk wrote:
Hi,
We currently run a multi-master setup where all 4 of our servers replicate from each other via delta-syncrepl but all our writes are directed at a selected "master" server.
I've noticed recently that we are suffering sync issues and this coincides with us upgrading from 2.4.46 with a patch for ITS8843 to 2.4.47.
Hi Mark,
Error 0x14 would be attribute type or value already exists. This would indicate a fundamental problem of there being discrepencies between your servers. I.e., your databases are out of sync.
That smells a little bit like NTP sync but these boxes are all set to aggressively poll a single NTP server:
server 129.215.205.191 minpoll 4 maxpoll 6 restrict 129.215.205.191 nomodify notrap
You know that you should have preferrably four servers? If you have only one, I's suggest so configure peering amng your MMR masters; if the server fails, it's likely that at least the MMR masters agree on what time it is...
And if you seem in a hurry with time sync, maybe add "iburst" option as well...
[...]
Regards, Ulrich
Hi Mark,
--On Thursday, June 06, 2019 1:12 PM +0100 Mark Cairney Mark.Cairney@ed.ac.uk wrote:
I.e., the only way you can ensure slapo-memberof will be "ok" in an replicated environment (syncrepl or delta-syncrepl) is if you can guarantee a REFRESH will never occur.
I think this was done in part to mitigate the behaviour in that ITS. If memberOf isn't compatible with syncrepl then does that mean you can't have replication and use the memberOf overlay? This would pose a problem for many medium-large installations surely? Looking back through the mailing list archive I do note a discussion of the whys and wherefores in September 2018.
Right, the primary issue is that if a server goes into REFRESH mode, the order in which the entries are sent back may not allow the slapo-memberOf overlay to rebuild the groups correctly.
The man page suggests using the dynlist overlay instead but given my limited understanding I don't see how that would work given the most common use case of the memberOf overlay is to get the group memberships when the user object is queried e.g. ldapsearch "uid=mcairney" "*" memberof
I gave some examples in https://www.openldap.org/its/index.cgi/?findid=8613. But none of it is pretty.
In general though it does seem like replication and attributes maintained locally via overlays (memberOf, ppolicy etc) are an absolute minefield!
Yes, it is. Unfortunately, Microsoft has never defined how memberOf should work in a replicated environment, and the RFC for ppolicy doesn't either. At least with ppolicy, one can configure back-ldap/slapo-chain to forward from the updates from a replica back to a provider so they get replicated out.
As I've noted a number of times on the list, overlay instantiation order is important for operation interception/processing. The syncprov overlay should be the first instantiated overlay, followed by accesslog, in a delta-syncrepl setup. In the above, this is clearly not the case.
I thought I'd seen this mentioned but wasn't 100% sure. I've now re-ordered my overlays as follows:
Great!
I would additionally note that the syncprov overlay is missing a sessionlog setting, where the default is likely much smaller than required for mitigating ITS#8125.
...and I've set olcSpSessionlog to be 500 (I'm not sure what the default is- 100?) Hopefully that's sufficient to avoid ITS#8125
It needs to be a value larger than the total number of entries in your database.
Regards, Quanah
--
Quanah Gibson-Mount Product Architect Symas Corporation Packaged, certified, and supported LDAP solutions powered by OpenLDAP: http://www.symas.com
On 06/06/2019 16:30, Quanah Gibson-Mount wrote:
Hi Mark,
--On Thursday, June 06, 2019 1:12 PM +0100 Mark Cairney Mark.Cairney@ed.ac.uk wrote:
I.e., the only way you can ensure slapo-memberof will be "ok" in an replicated environment (syncrepl or delta-syncrepl) is if you can guarantee a REFRESH will never occur.
I think this was done in part to mitigate the behaviour in that ITS. If memberOf isn't compatible with syncrepl then does that mean you can't have replication and use the memberOf overlay? This would pose a problem for many medium-large installations surely? Looking back through the mailing list archive I do note a discussion of the whys and wherefores in September 2018.
Right, the primary issue is that if a server goes into REFRESH mode, the order in which the entries are sent back may not allow the slapo-memberOf overlay to rebuild the groups correctly.
The man page suggests using the dynlist overlay instead but given my limited understanding I don't see how that would work given the most common use case of the memberOf overlay is to get the group memberships when the user object is queried e.g. ldapsearch "uid=mcairney" "*" memberof
I gave some examples in https://www.openldap.org/its/index.cgi/?findid=8613. But none of it is pretty.
Ahh OK it looks like you're editing the "memberof" attribute in the user (without the overlay) which is then used by the dynlist overlay to generate group memberships on the fly.
Clever but I don't think we could switch over to this approach anytime soon as the majority of our groups are automatically maintained by an external application called "Grouper". It could potentially be configured to do something along these lines but it wouldn't be straightforward!
Something to think about/experiment with though....
In general though it does seem like replication and attributes maintained locally via overlays (memberOf, ppolicy etc) are an absolute minefield!
Yes, it is. Unfortunately, Microsoft has never defined how memberOf should work in a replicated environment, and the RFC for ppolicy doesn't either. At least with ppolicy, one can configure back-ldap/slapo-chain to forward from the updates from a replica back to a provider so they get replicated out.
As I've noted a number of times on the list, overlay instantiation order is important for operation interception/processing. The syncprov overlay should be the first instantiated overlay, followed by accesslog, in a delta-syncrepl setup. In the above, this is clearly not the case.
I thought I'd seen this mentioned but wasn't 100% sure. I've now re-ordered my overlays as follows:
Great!
I would additionally note that the syncprov overlay is missing a sessionlog setting, where the default is likely much smaller than required for mitigating ITS#8125.
...and I've set olcSpSessionlog to be 500 (I'm not sure what the default is- 100?) Hopefully that's sufficient to avoid ITS#8125
It needs to be a value larger than the total number of entries in your database.
Thanks- I've now set it 2000000 based on our present entry count of ~1.25M. We've got 64GB RAM on these boxes so hopefully with a 30GB max DB size set we should have enough.
Many thanks,
Mark
Regards, Quanah
--
Quanah Gibson-Mount Product Architect Symas Corporation Packaged, certified, and supported LDAP solutions powered by OpenLDAP: http://www.symas.com
On 6/6/19 5:30 PM, Quanah Gibson-Mount wrote:
Right, the primary issue is that if a server goes into REFRESH mode, the order in which the entries are sent back may not allow the slapo-memberOf overlay to rebuild the groups correctly. [..] I gave some examples in https://www.openldap.org/its/index.cgi/?findid=8613. But none of it is pretty.
Personally I still fail to see why attribute 'memberOf' receives this special treatment compared to e.g. 'modifyTimestamp' etc.
IMO it should be generated on the slapd replica which received the write operation from the client and just replicated like many other operational attributes.
Ciao, Michael.
openldap-technical@openldap.org