Full_Name: Quanah Gibson-Mount Version: 2.4.44 OS: Linux 3.13 URL: ftp://ftp.openldap.org/incoming/ Submission from: (NULL) (75.111.52.177)
I have a MMR setup, with two providers, zre-ldap003 and zre-ldap005. I have a single replica, zre-ldap002, which is configured to pull updates from both providers. However, it continually gets an error 53 from zre-ldap005 after writing an update to zre-ldap003. As far as I can tell, all contextCSN data is correct on all nodes.
replica: zimbra@zre-ldap002:~$ ldapsearch -x -LLL -H ldapi:// -D cn=config -w zimbra -s base -b "" contextCSN dn: contextCSN: 20160218220547.523250Z#000000#001#000000 contextCSN: 20160218220604.321502Z#000000#002#000000
Master SID1: [zimbra@zre-ldap005 ~]$ ldapsearch -x -LLL -H ldapi:// -D cn=config -w zimbra -s base -b "" contextCSN dn: contextCSN: 20160218220547.523250Z#000000#001#000000 contextCSN: 20160218220604.321502Z#000000#002#000000
[zimbra@zre-ldap005 ~]$ ldapsearch -x -LLL -H ldapi:// -D cn=config -w zimbra -s base -b "cn=accesslog" contextCSN dn: cn=accesslog contextCSN: 20160217223749.648918Z#000000#001#000000 contextCSN: 20160218220604.321502Z#000000#002#000000
Master SID2: [zimbra@zre-ldap003 ~]$ ldapsearch -x -LLL -H ldapi:// -D cn=config -w zimbra -s base -b "" contextCSN dn: contextCSN: 20160218220547.523250Z#000000#001#000000 contextCSN: 20160218220604.321502Z#000000#002#000000
[zimbra@zre-ldap003 ~]$ ldapsearch -x -LLL -H ldapi:// -D cn=config -w zimbra -s base -b "cn=accesslog" contextCSN dn: cn=accesslog contextCSN: 20160218220547.523250Z#000000#001#000000 contextCSN: 20160218220604.321502Z#000000#002#000000%%0
Note all contextCSNs have the same value everywhere.
However, when the replica attempts to replicate off of zre-ldap005, we get the following error:
Feb 18 16:50:34 zre-ldap005 slapd[10555]: conn=1191 fd=18 ACCEPT from IP=10.137.242.52:38979 (IP=10.137.242.55:389) Feb 18 16:50:34 zre-ldap005 slapd[10555]: conn=1191 op=0 EXT oid=1.3.6.1.4.1.1466.20037 Feb 18 16:50:34 zre-ldap005 slapd[10555]: conn=1191 op=0 STARTTLS Feb 18 16:50:34 zre-ldap005 slapd[10555]: conn=1191 op=0 RESULT oid= err=0 etime=0.000071 text= Feb 18 16:50:34 zre-ldap005 slapd[10555]: conn=1191 fd=18 TLS established tls_ssf=128 ssf=128 tls_proto=TLSv1.2 tls_cipher=SEED-SHA Feb 18 16:50:34 zre-ldap005 slapd[10555]: conn=1191 op=1 BIND dn="uid=zmreplica,cn=admins,cn=zimbra" metho3D3D128 Feb 18 16:50:34 zre-ldap005 slapd[10555]: conn=1191 op=1 BIND dn="uid=zmreplica,cn=admins,cn=zimbra" mech=SIMPLE ssf=0 Feb 18 16:50:34 zre-ldap005 slapd[10555]: conn=1191 op=1 RESULT tag=97 err=0 etime=0.000154 text= Feb 18 16:50:34 zre-ldap005 slapd[10555]: conn=1191 op=2 SRCH base="cn=accesslog" scope=2 deref=0 filter="(&(objectClass=auditWriteObject)(reqResult=0))" Feb 18 16:50:34 zre-ldap005 slapd[10555]: conn=1191 op=2 SRCH attr=reqDN reqType reqMod reqNewRDN reqDeleteOldRDN reqNewSuperior entryCSN Feb 18 16:50:34 zre-ldap005 slapd[10555]: conn=1191 op=2 SEARCH RESULT tag=101 err=53 etime=0.000177 nentries=0 text=consumer state is newer than provider! Feb 18 16:50:34 zre-ldap005 slapd[10555]: conn=1191 op=3 UNBIND Feb 18 16:50:34 zre-ldap5 5 slapd[10555]: conn=1191 fd=18 closed
This started happening after I made a modification on zre-ldap005 followed by a modification on zre-ldap003. In that case, the following was logged on zre-ldap002:
Feb 18 16:06:04 zre-ldap002 slapd[11266]: do_syncrep2: rid=101 cookie=rid=101,sid=002,csn=20160218220604.321502Z#000000#002#000000 Feb 18 16:06:04 zre-ldap002 slapd[11266]: slap_queue_csn: queueing 0x3589880 20160218220604.321502Z#000000#002#000000 Feb 18 16:06:04 zre-ldap002 slapd[11266]: do_syncrep2: rid=100 cookie=r%3=100,sid=001,csn=20160218220604.321502Z#000000#002#000000 Feb 18 16:06:04 zre-ldap002 slapd[11266]: slap_graduate_commit_csn: removing 0x3589880 20160218220604.321502Z#000000#002#000000 Feb 18 16:06:04 zre-ldap002 slapd[11266]: syncrepl_message_to_op: rid=101 be_modify cn=admins,cn=zimbra (0) Feb 18 16:06:04 zre-ldap002 slapd[11266]: slap_queue_csn: queueing 0x3589740 20160218220604.321502Z#000000#002#000000 Feb 18 16:06:04 zre-ldap002 slapd[11266]: slap_graduate_commit_csn: removing 0x3589740 20160218220604.321502Z#000000#002#000000 Feb 18 16:06:04 zre-ldap002 slapd[11266]: do_syncrep2: rid=100 CSN pending, ignoring 20160218220604.321502Z#000000#002#000000 (reqStart=20160218220604.326527Z,cn=accesslog)
So we see the write accepted from one master, and correctly rejected (not replayed) from the other.
After this, there were no more modifications made to either master. However, I *did* make a modification to cn=config on the replica. After that was done, is when the error(53) started getting triggered:
Feb 18 16:27:12 zre-ldap002 slapd[11266]: slap_queue_csn: queueing 0x3587a40 20160218222712.155976Z#000000#000#000000 Feb 18 16:27:12 zre-ldap002 slapd[11266]: slap_graduate_commit_csn: removing 0x3587a40 20160218222712.155976Z#000000#000#000000 Feb 18 16:27:12 zre-ldap002 slapd[11266]: do_syncrep2: rid=100 LDAP_RES_SEARCH_RESULT Feb 18 16:27:12 zre-ldap002 slapd[11266]: do_syncrep2: rid=100 LDAP_RES_SEARCH_RESULT (53) Server is unwilling to perform Feb 18 16:27:12 zre-ldap002 slapd[11266]: do_syncrep2: rid=100 (53) Server is unwilling to perform Feb 18 16:27:12 zre-ldap002 slapd[11266]: do_syncrepl: rid=100 rc -2 retrying