I have a problem that has just started happening in the last few weeks. Replication of modifications and/or adds from one LDAP node to the other will work fine ( repl time <1s) for a while, then suddenly stop working all together. The only way to get replication working again is to restart each node 2 or 3 times (sometimes it comes back after 2 restarts, sometimes it takes 3). On the last restart, I usually find that one node is stuck in an infinite loop trying to process a transaction from the other node's accesslog - the only way to get around this is to delete the the accesslog entry (see below for accesslog entry and slapd output). Any ideas?
OS: OpenSuse 11.1 64-bit w/ 2GB RAM Version: OpenLDAP 2.4.23 Network Setup: 2 read/write nodes, with delta syncrepl between the nodes
SERVER 1 Configs: dn: cn=config objectClass: olcGlobal cn: config olcArgsFile: /var/run/slapd/slapd.args olcAuthzRegexp: {0}gidNumber=0+uidNumber=0,cn=peercred,cn=external,cn=auth dn:cn=config olcLogLevel: stats sync olcPidFile: /var/run/slapd/slapd.pid olcServerID: 1 ldap://LDAP1:389 olcServerID: 2 ldap://LDAP2:389 olcSizeLimit: 10000
dn: olcDatabase={-1}frontend,cn=config objectClass: olcDatabaseConfig olcDatabase: {-1}frontend olcAccess: {0}to dn.base="" by * read olcAccess: {1}to dn.base="cn=Subschema" by * read olcAccess: {2}to * by dn.base="cn=admin,o=o" manage
dn: olcDatabase={0}config,cn=config objectClass: olcDatabaseConfig olcDatabase: {0}config olcAccess: {0}to * by dn.base="cn=admin,o=o" manage olcAccess: {1}to * by * read olcRootDN: cn=config
dn: olcDatabase={1}hdb,cn=config objectClass: olcDatabaseConfig objectClass: olcHdbConfig olcDatabase: {1}hdb olcDbDirectory: /var/lib/ldap olcAccess: {0}to attrs=pwdChangedTime,pwdAccountLockedTime,pwdFailureTime,pwdHistory,pwdGraceUseTime,pwdReset by * none olcAccess: {1}to attrs=userPassword by self write by * auth by dn.base="cn=admin,o=o" manage olcAccess: {2}to attrs=shadowLastChange by self write by * read olcAccess: {3}to attrs=userPKCS12 by self read by * none olcAccess: {4}to * by * read by dn.base="cn=admin,o=o" manage olcDbCheckpoint: 1024 5 olcDbConfig: {0}set_cachesize 0 151200000 1 olcDbConfig: {1}set_lg_regionmax 262144 olcDbConfig: {2}set_lg_bsize 2097152 olcDbConfig: {3}set_flags DB_LOG_AUTOREMOVE olcDbIDLcacheSize: 151200000 olcDbIndex: objectclass eq olcDbIndex: uidNumber eq olcDbIndex: gidNumber eq olcDbIndex: member eq olcDbIndex: memberUid eq olcDbIndex: mail eq olcDbIndex: cn eq,sub olcDbIndex: displayName eq,sub olcDbIndex: uid eq,sub olcDbIndex: sn eq,sub olcDbIndex: givenName eq,sub olcDbIndex: entryCSN eq olcDbIndex: entryUUID eq olcLimits: {0}dn.exact="cn=admin,o=o" time.soft=unlimited time.hard=unlimited size.soft=unlimited size.hard=unlimited olcMirrorMode: TRUE olcRootDN: cn=admin,o=o olcRootPW: {SSHA}PWD olcSuffix: o=o olcSyncrepl: {0}rid=2 provider=ldap://LDAP2:389 binddn="cn=admin,o=o" bindmethod=simple credentials=PWD searchbase="ou=USERS,o=o" logbase="cn=accesslog" logfilter="(&(objectClass=auditWriteObject)(reqResult=0))" schemachecking=off type=refreshAndPersist retry="60 +" syncdata=accesslog
dn: olcOverlay={0}ppolicy,olcDatabase={1}hdb,cn=config objectClass: olcPPolicyConfig olcOverlay: {0}ppolicy olcPPolicyDefault: cn=Default Policy,o=o olcPPolicyHashCleartext: TRUE olcPPolicyUseLockout: TRUE
dn: olcOverlay={1}syncprov,olcDatabase={1}hdb,cn=config objectClass: olcOverlayConfig objectClass: olcSyncProvConfig olcOverlay: {1}syncprov olcSpCheckpoint: 100 1 olcSpSessionlog: 100
dn: olcOverlay={2}accesslog,olcDatabase={1}hdb,cn=config objectClass: olcAccessLogConfig objectClass: olcOverlayConfig objectClass: olcConfig objectClass: top olcAccessLogDB: cn=accesslog olcOverlay: {2}accesslog olcAccessLogOps: writes olcAccessLogPurge: 03+00:00 01+00:00 olcAccessLogSuccess: TRUE
dn: olcDatabase={2}accesslog,cn=config objectClass: olcDatabaseConfig objectClass: olcHdbConfig olcDatabase: {2}hdb olcDbDirectory: /var/lib/ldap_accesslog olcAccess:: ezB9dG8gKiBieSBkbi5iYXNlPSJjbj1hZG1pbixvPWNvIiBtYW5hZ2Ug olcDbCheckpoint: 1024 5 olcDbConfig: {0}set_cachesize 0 9948000 1 olcDbConfig: {1}set_lg_regionmax 262144 olcDbConfig: {2}set_lg_bsize 2097152 olcDbConfig: {3}set_flags DB_LOG_AUTOREMOVE olcDbIDLcacheSize: 9948000 olcDbIndex: entryCSN eq olcDbIndex: objectClass eq olcDbIndex: reqEnd eq olcDbIndex: reqResult eq olcDbIndex: reqStart eq olcDbIndex: default eq olcLimits: {0}dn.exact="cn=admin,o=o" time.soft=unlimited time.hard=unlimited size.soft=unlimited size.hard=unlimited olcSuffix: cn=accesslog
dn: olcOverlay={0}syncprov,olcDatabase={2}accesslog,cn=config objectClass: olcOverlayConfig objectClass: olcSyncProvConfig olcOverlay: {0}syncprov olcSpCheckpoint: 100 1 olcSpNoPresent: TRUE olcSpReloadHint: TRUE olcSpSessionlog: 100
dn: olcDatabase={3}monitor,cn=config objectClass: olcMonitorConfig olcDatabase: {3}monitor olcAccess: {0}to * by dn.base="cn=admin,o=o" manage olcAccess: {1}to * by * none olcMonitoring: true -------------------------- SERVER 2 Configs: version: 1
dn: cn=config objectClass: olcGlobal cn: config olcArgsFile: /var/run/slapd/slapd.args olcAuthzRegexp: {0}gidNumber=0+uidNumber=0,cn=peercred,cn=external,cn=auth dn:cn=config olcLogLevel: stats sync olcPidFile: /var/run/slapd/slapd.pid olcServerID: 1 ldap://LDAP1:389 olcServerID: 2 ldap://LDAP2:389 olcSizeLimit: 10000
dn: olcDatabase={-1}frontend,cn=config objectClass: olcDatabaseConfig olcDatabase: {-1}frontend olcAccess: {0}to dn.base="" by * read olcAccess: {1}to dn.base="cn=Subschema" by * read olcAccess: {2}to * by dn.base="cn=admin,o=o" manage
dn: olcDatabase={0}config,cn=config objectClass: olcDatabaseConfig olcDatabase: {0}config olcAccess: {0}to * by dn.base="cn=admin,o=o" manage olcAccess: {1}to * by * read olcRootDN: cn=config
dn: olcDatabase={1}hdb,cn=config objectClass: olcDatabaseConfig objectClass: olcHdbConfig olcDatabase: {1}hdb olcDbDirectory: /var/lib/ldap olcAccess: {0}to attrs=pwdChangedTime,pwdAccountLockedTime,pwdFailureTime,pwdHistory,pwdGraceUseTime,pwdReset by * none olcAccess: {1}to attrs=userPassword by self write by * auth by dn.base="cn=admin,o=o" manage olcAccess: {2}to attrs=shadowLastChange by self write by * read olcAccess: {3}to attrs=userPKCS12 by self read by * none olcAccess: {4}to * by * read by dn.base="cn=admin,o=o" manage olcDbCheckpoint: 1024 5 olcDbConfig: {0}set_cachesize 0 151200000 1 olcDbConfig: {1}set_lg_regionmax 262144 olcDbConfig: {2}set_lg_bsize 2097152 olcDbConfig: {3}set_flags DB_LOG_AUTOREMOVE olcDbIDLcacheSize: 151200000 olcDbIndex: objectclass eq olcDbIndex: uidNumber eq olcDbIndex: gidNumber eq olcDbIndex: member eq olcDbIndex: memberUid eq olcDbIndex: mail eq olcDbIndex: cn eq,sub olcDbIndex: displayName eq,sub olcDbIndex: uid eq,sub olcDbIndex: sn eq,sub olcDbIndex: givenName eq,sub olcDbIndex: entryCSN eq olcDbIndex: entryUUID eq olcLimits: {0}dn.exact="cn=admin,o=o" time.soft=unlimited time.hard=unlimited size.soft=unlimited size.hard=unlimited olcMirrorMode: TRUE olcRootDN: cn=admin,o=o olcRootPW: {SSHA}PWD olcSuffix: o=o olcSyncrepl: {0}rid=1 provider=ldap://LDAP1:389 binddn="cn=admin,o=o" bindmethod=simple credentials=PWD searchbase="ou=USERS,o=o" logbase="cn=accesslog" logfilter="(&(objectClass=auditWriteObject)(reqResult=0))" schemachecking=off type=refreshAndPersist retry="60 +" syncdata=accesslog
dn: olcOverlay={0}ppolicy,olcDatabase={1}hdb,cn=config objectClass: olcPPolicyConfig olcOverlay: {0}ppolicy olcPPolicyDefault: cn=Default Policy,o=o olcPPolicyHashCleartext: TRUE olcPPolicyUseLockout: TRUE
dn: olcOverlay={1}syncprov,olcDatabase={1}hdb,cn=config objectClass: olcOverlayConfig objectClass: olcSyncProvConfig olcOverlay: {1}syncprov olcSpCheckpoint: 100 1 olcSpSessionlog: 100
dn: olcOverlay={2}accesslog,olcDatabase={1}hdb,cn=config objectClass: olcAccessLogConfig objectClass: olcOverlayConfig objectClass: olcConfig objectClass: top olcAccessLogDB: cn=accesslog olcOverlay: {2}accesslog olcAccessLogOps: writes olcAccessLogPurge: 03+00:00 01+00:00 olcAccessLogSuccess: TRUE
dn: olcDatabase={2}accesslog,cn=config objectClass: olcDatabaseConfig objectClass: olcHdbConfig olcDatabase: {2}hdb olcDbDirectory: /var/lib/ldap_accesslog olcAccess:: ezB9dG8gKiBieSBkbi5iYXNlPSJjbj1hZG1pbixvPWNvIiBtYW5hZ2Ug olcDbCheckpoint: 1024 5 olcDbConfig: {0}set_cachesize 0 9948000 1 olcDbConfig: {1}set_lg_regionmax 262144 olcDbConfig: {2}set_lg_bsize 2097152 olcDbConfig: {3}set_flags DB_LOG_AUTOREMOVE olcDbIDLcacheSize: 9948000 olcDbIndex: entryCSN eq olcDbIndex: objectClass eq olcDbIndex: reqEnd eq olcDbIndex: reqResult eq olcDbIndex: reqStart eq olcDbIndex: default eq olcLimits: {0}dn.exact="cn=admin,o=o" time.soft=unlimited time.hard=unlimited size.soft=unlimited size.hard=unlimited olcSuffix: cn=accesslog
dn: olcOverlay={0}syncprov,olcDatabase={2}accesslog,cn=config objectClass: olcOverlayConfig objectClass: olcSyncProvConfig olcOverlay: {0}syncprov olcSpCheckpoint: 100 1 olcSpNoPresent: TRUE olcSpReloadHint: TRUE olcSpSessionlog: 100
dn: olcDatabase={3}monitor,cn=config objectClass: olcMonitorConfig olcDatabase: {3}monitor olcAccess: {0}to * by dn.base="cn=admin,o=o" manage olcAccess: {1}to * by * none olcMonitoring: true -------------------------- Bad accesslog entry & slapd output version: 1 ####From Transaction recorded in accesslog on LDAP1 dn: reqStart=20111212033402.000002Z,cn=accesslog objectClass: auditModify reqAuthzID: cn=admin,o=co reqDN: cn=bob,ou=users,o=co reqEnd: 20111212033402.000003Z reqMod: userPassword:= {SSHA}PWD reqMod: pwdChangedTime:= 20111212033402Z reqMod: entryCSN:= 2011 12 12 03 34 02.374746Z#000000#001#000000 reqMod: modifiersName:= cn=admin,o=o reqMod: modifyTimestamp:= 20111212033402Z reqResult: 0 reqSession: 3296 reqStart: 20111212033402.000002Z reqType: modify
#####Log Of LDAP2 after 2nd restart (No Related Log info on LDAP1 during that timeframe) Dec 12 07:54:43 LDAP2 slapd[2732]: slap_queue_csn: queing 0x7f093b746ea5 20111212033402.374746Z#000000#001#000000 Dec 12 07:54:43 LDAP2 slapd[2732]: slap_graduate_commit_csn: removing 0x7f093b639100 20111212033402.374746Z#000000#001#000000 Dec 12 07:54:43 LDAP2 slapd[2732]: do_syncrep2: rid=001 cookie=rid=001,sid=001,csn=20111212033402.374746Z#000000#001#000000 Dec 12 07:54:43 LDAP2 slapd[2732]: slap_queue_csn: queing 0x7f093b746ea5 20111212033402.374746Z#000000#001#000000 Dec 12 07:54:43 LDAP2 slapd[2732]: slap_graduate_commit_csn: removing 0x7f093b568240 20111212033402.374746Z#000000#001#000000 Dec 12 07:54:43 LDAP2 slapd[2732]: do_syncrep2: rid=001 cookie=rid=001,sid=001,csn=20111212033402.374746Z#000000#001#000000 Dec 12 07:54:43 LDAP2 slapd[2732]: slap_queue_csn: queing 0x7f093b746ea5 20111212033402.374746Z#000000#001#000000 Dec 12 07:54:43 LDAP2 slapd[2732]: slap_graduate_commit_csn: removing 0x7f093b639100 20111212033402.374746Z#000000#001#000000 Dec 12 07:54:43 LDAP2 slapd[2732]: do_syncrep2: rid=001 cookie=rid=001,sid=001,csn=20111212033402.374746Z#000000#001#000000 Dec 12 07:54:43 LDAP2 slapd[2732]: slap_queue_csn: queing 0x7f093b746ea5 20111212033402.374746Z#000000#001#000000 Dec 12 07:54:43 LDAP2 slapd[2732]: slap_graduate_commit_csn: removing 0x7f093b568240 20111212033402.374746Z#000000#001#000000 Dec 12 07:54:43 LDAP2 slapd[2732]: do_syncrep2: rid=001 cookie=rid=001,sid=001,csn=20111212033402.374746Z#000000#001#000000 Dec 12 07:54:43 LDAP2 slapd[2732]: slap_queue_csn: queing 0x7f093b746ea5 20111212033402.374746Z#000000#001#000000 Dec 12 07:54:43 LDAP2 slapd[2732]: slap_graduate_commit_csn: removing 0x7f093b639100 20111212033402.374746Z#000000#001#000000 Dec 12 07:54:43 LDAP2 slapd[2732]: do_syncrep2: rid=001 cookie=rid=001,sid=001,csn=20111212033402.374746Z#000000#001#000000 Dec 12 07:54:43 LDAP2 slapd[2732]: slap_queue_csn: queing 0x7f093b746ea5 20111212033402.374746Z#000000#001#000000 Dec 12 07:54:43 LDAP2 slapd[2732]: slap_graduate_commit_csn: removing 0x7f093b568240 20111212033402.374746Z#000000#001#000000 Dec 12 07:54:43 LDAP2 slapd[2732]: do_syncrep2: rid=001 cookie=rid=001,sid=001,csn=20111212033402.374746Z#000000#001#000000 Dec 12 07:54:43 LDAP2 slapd[2732]: slap_queue_csn: queing 0x7f093b746ea5 20111212033402.374746Z#000000#001#000000 Dec 12 07:54:43 LDAP2 slapd[2732]: slap_graduate_commit_csn: removing 0x7f093b639100 20111212033402.374746Z#000000#001#000000 Dec 12 07:54:43 LDAP2 slapd[2732]: do_syncrep2: rid=001 cookie=rid=001,sid=001,csn=20111212033402.374746Z#000000#001#000000 Dec 12 07:54:43 LDAP2 slapd[2732]: slap_queue_csn: queing 0x7f093b746ea5 20111212033402.374746Z#000000#001#000000 Dec 12 07:54:43 LDAP2 slapd[2732]: slap_graduate_commit_csn: removing 0x7f093b568240 20111212033402.374746Z#000000#001#000000 Dec 12 07:54:43 LDAP2 slapd[2732]: do_syncrep2: rid=001 cookie=rid=001,sid=001,csn=20111212033402.374746Z#000000#001#000000 Dec 12 07:54:43 LDAP2 slapd[2732]: slap_queue_csn: queing 0x7f093b746ea5 20111212033402.374746Z#000000#001#000000 Dec 12 07:54:43 LDAP2 slapd[2732]: slap_graduate_commit_csn: removing 0x7f093b639100 20111212033402.374746Z#000000#001#000000 Dec 12 07:54:43 LDAP2 slapd[2732]: do_syncrep2: rid=001 cookie=rid=001,sid=001,csn=20111212033402.374746Z#000000#001#000000 Dec 12 07:54:43 LDAP2 slapd[2732]: slap_queue_csn: queing 0x7f093b746ea5 20111212033402.374746Z#000000#001#000000 Dec 12 07:54:43 LDAP2 slapd[2732]: slap_graduate_commit_csn: removing 0x7f093b568240 20111212033402.374746Z#000000#001#000000 Dec 12 07:54:43 LDAP2 slapd[2732]: do_syncrep2: rid=001 cookie=rid=001,sid=001,csn=20111212033402.374746Z#000000#001#000000 Dec 12 07:54:43 LDAP2 slapd[2732]: slap_queue_csn: queing 0x7f093b746ea5 20111212033402.374746Z#000000#001#000000 Dec 12 07:54:43 LDAP2 slapd[2732]: slap_graduate_commit_csn: removing 0x7f093b639100 20111212033402.374746Z#000000#001#000000 Dec 12 07:54:43 LDAP2 slapd[2732]: do_syncrep2: rid=001 cookie=rid=001,sid=001,csn=20111212033402.374746Z#000000#001#000000 Dec 12 07:54:43 LDAP2 slapd[2732]: slap_queue_csn: queing 0x7f093b746ea5 20111212033402.374746Z#000000#001#000000 Dec 12 07:54:43 LDAP2 slapd[2732]: slap_graduate_commit_csn: removing 0x7f093b568240 20111212033402.374746Z#000000#001#000000 Dec 12 07:54:43 LDAP2 slapd[2732]: do_syncrep2: rid=001 cookie=rid=001,sid=001,csn=20111212033402.374746Z#000000#001#000000 Dec 12 07:54:43 LDAP2 slapd[2732]: slap_queue_csn: queing 0x7f093b746ea5 20111212033402.374746Z#000000#001#000000 Dec 12 07:54:43 LDAP2 slapd[2732]: slap_graduate_commit_csn: removing 0x7f093b639100 20111212033402.374746Z#000000#001#000000 Dec 12 07:54:43 LDAP2 slapd[2732]: do_syncrep2: rid=001 cookie=rid=001,sid=001,csn=20111212033402.374746Z#000000#001#000000 Dec 12 07:54:43 LDAP2 slapd[2732]: slap_queue_csn: queing 0x7f093b746ea5 20111212033402.374746Z#000000#001#000000 Dec 12 07:54:43 LDAP2 slapd[2732]: slap_graduate_commit_csn: removing 0x7f093b568240 20111212033402.374746Z#000000#001#000000 Dec 12 07:54:43 LDAP2 slapd[2732]: do_syncrep2: rid=001 cookie=rid=001,sid=001,csn=20111212033402.374746Z#000000#001#000000
--On Monday, December 19, 2011 5:31 PM -0700 david m dmalkin@gmail.com wrote:
OS: OpenSuse 11.1 64-bit w/ 2GB RAM Version: OpenLDAP 2.4.23 Network Setup: 2 read/write nodes, with delta syncrepl between the nodes
I'm not sure what you mean by 2 read/write nodes here. delta-syncrepl does not support MMR until the 2.4.27 release. You're clearly 4 releases before that. Any expectation that having 2 read/write nodes using delta-syncrepl prior to 2.4.27 will work is bogus.
--Quanah
--
Quanah Gibson-Mount Sr. Member of Technical Staff Zimbra, Inc A Division of VMware, Inc. -------------------- Zimbra :: the leader in open source messaging and collaboration
openldap-technical@openldap.org