quanah(a)zimbra.com wrote:
--On Wednesday, May 16, 2012 10:27 PM +0000 quanah(a)OpenLDAP.org
wrote:
> Full_Name: Quanah Gibson-Mount
> Version: 2.4.31
> OS: Linux 2.6
> URL:
ftp://ftp.openldap.org/incoming/
> Submission from: (NULL) (75.108.184.39)
We can see that the script turning it into a master ran here:
Thu May 17 16:05:46 2012 *** Running as zimbra user:
/opt/zimbra/libexec/zmldapenable-mmr -s 2 -m
ldap://zre-ldap002.eng.vmware.com:389/
so 16:05:46
In the accesslog, we see:
dn: cn=accesslog
objectClass: auditContainer
cn: accesslog
structuralObjectClass: auditContainer
contextCSN: 20120517225152.913667Z#000000#000#000000
contextCSN: 20120517230823.615364Z#000000#001#000000
contextCSN: 20120517230546.409118Z#000000#002#000000
dn: reqStart=20120517230546.000019Z,cn=accesslog
objectClass: auditAdd
structuralObjectClass: auditAdd
reqStart: 20120517230546.000019Z
reqEnd: 20120517230546.000020Z
reqType: add
reqSession: 100
reqAuthzID: cn=config
reqDN: cn=zimbra
reqResult: 0
reqMod: objectClass:+ organizationalRole
reqMod: description:+ Zimbra Systems Application Data
reqMod: cn:+ zimbra
reqMod: structuralObjectClass:+ organizationalRole
reqMod: entryUUID:+ 40f78bea-34be-1031-8a5d-e1466f667e19
reqMod: creatorsName:+ cn=config
reqMod: createTimestamp:+ 20120517224907Z
reqMod: entryCSN:+ 20120517224907.221672Z#000000#000#000000
reqMod: modifiersName:+ cn=config
reqMod: modifyTimestamp:+ 20120517224907Z
reqEntryUUID: 40f78bea-34be-1031-8a5d-e1466f667e19
entryUUID: 948929e2-34c0-1031-9a14-c93bd10ff0f2
creatorsName: cn=config
createTimestamp: 20120517224907Z
entryCSN: 20120517224907.221672Z#000000#000#000000
modifiersName: cn=config
modifyTimestamp: 20120517224907Z
so it is tracking "000" as a third master? This seems to be why the
original server (which was 000 before being promoted to 001) replicates
these entries back to itself.
The loop is caused by the patch to ITS#6872, which considers a consumer out of
date whenever the number of CSNs in its sync request doesn't match the number
known to the provider.
The data here is basically invalid: server1 has entries generated using SID=0
but it has no contextCSN value with SID=0. It only sent SID=1 and SID=2 in its
sync request. Server2, which just updated from server1, has a contextCSN for
SID=0 in addition to 1 and 2 (and that's all correct).
Server1 should have always had a contextCSN value for SID=0 but doesn't. This
problem would not occur if server1 was converted first from standalone into a
single-master. I.e., load syncprov on it, let it scan the DB and generate the
first sid=0 contextCSN, before turning it intu a MMR node.
--
-- Howard Chu
CTO, Symas Corp.
http://www.symas.com
Director, Highland Sun
http://highlandsun.com/hyc/
Chief Architect, OpenLDAP
http://www.openldap.org/project/