(ITS#8373) replica with multiple providers hits errors - openldap-bugs

18 Feb 2016


      Full_Name: Quanah Gibson-Mount
Version: 2.4.44
OS: Linux 3.13
URL: ftp://ftp.openldap.org/incoming/
Submission from: (NULL) (75.111.52.177)
I have a MMR setup, with two providers, zre-ldap003 and zre-ldap005.  I have a
single replica, zre-ldap002, which is configured to pull updates from both
providers.  However, it continually gets an error 53 from zre-ldap005 after
writing an update to zre-ldap003.  As far as I can tell, all contextCSN data is
correct on all nodes.
replica:
zimbra@zre-ldap002:~$ ldapsearch -x -LLL -H ldapi:// -D cn=config -w zimbra -s
base -b "" contextCSN
dn:
contextCSN: 20160218220547.523250Z#000000#001#000000
contextCSN: 20160218220604.321502Z#000000#002#000000
Master SID1:
[zimbra@zre-ldap005 ~]$ ldapsearch -x -LLL -H ldapi:// -D cn=config -w zimbra -s
base -b "" contextCSN
dn:
contextCSN: 20160218220547.523250Z#000000#001#000000
contextCSN: 20160218220604.321502Z#000000#002#000000
[zimbra@zre-ldap005 ~]$ ldapsearch -x -LLL -H ldapi:// -D cn=config -w zimbra -s
base -b "cn=accesslog" contextCSN
dn: cn=accesslog
contextCSN: 20160217223749.648918Z#000000#001#000000
contextCSN: 20160218220604.321502Z#000000#002#000000
Master SID2:
[zimbra@zre-ldap003 ~]$ ldapsearch -x -LLL -H ldapi:// -D cn=config -w zimbra -s
base -b "" contextCSN
dn:
contextCSN: 20160218220547.523250Z#000000#001#000000
contextCSN: 20160218220604.321502Z#000000#002#000000
[zimbra@zre-ldap003 ~]$ ldapsearch -x -LLL -H ldapi:// -D cn=config -w zimbra -s
base -b "cn=accesslog" contextCSN
dn: cn=accesslog
contextCSN: 20160218220547.523250Z#000000#001#000000
contextCSN: 20160218220604.321502Z#000000#002#000000%%0
Note all contextCSNs have the same value everywhere.
However, when the replica attempts to replicate off of zre-ldap005, we get the
following error:
Feb 18 16:50:34 zre-ldap005 slapd[10555]: conn=1191 fd=18 ACCEPT from
IP=10.137.242.52:38979 (IP=10.137.242.55:389)
Feb 18 16:50:34 zre-ldap005 slapd[10555]: conn=1191 op=0 EXT
oid=1.3.6.1.4.1.1466.20037
Feb 18 16:50:34 zre-ldap005 slapd[10555]: conn=1191 op=0 STARTTLS
Feb 18 16:50:34 zre-ldap005 slapd[10555]: conn=1191 op=0 RESULT oid= err=0
etime=0.000071 text=
Feb 18 16:50:34 zre-ldap005 slapd[10555]: conn=1191 fd=18 TLS established
tls_ssf=128 ssf=128 tls_proto=TLSv1.2 tls_cipher=SEED-SHA
Feb 18 16:50:34 zre-ldap005 slapd[10555]: conn=1191 op=1 BIND
dn="uid=zmreplica,cn=admins,cn=zimbra" metho3D3D128
Feb 18 16:50:34 zre-ldap005 slapd[10555]: conn=1191 op=1 BIND
dn="uid=zmreplica,cn=admins,cn=zimbra" mech=SIMPLE ssf=0
Feb 18 16:50:34 zre-ldap005 slapd[10555]: conn=1191 op=1 RESULT tag=97 err=0
etime=0.000154 text=
Feb 18 16:50:34 zre-ldap005 slapd[10555]: conn=1191 op=2 SRCH
base="cn=accesslog" scope=2 deref=0
filter="(&(objectClass=auditWriteObject)(reqResult=0))"
Feb 18 16:50:34 zre-ldap005 slapd[10555]: conn=1191 op=2 SRCH attr=reqDN reqType
reqMod reqNewRDN reqDeleteOldRDN reqNewSuperior entryCSN
Feb 18 16:50:34 zre-ldap005 slapd[10555]: conn=1191 op=2 SEARCH RESULT tag=101
err=53 etime=0.000177 nentries=0 text=consumer state is newer than provider!
Feb 18 16:50:34 zre-ldap005 slapd[10555]: conn=1191 op=3 UNBIND
Feb 18 16:50:34 zre-ldap5 5 slapd[10555]: conn=1191 fd=18 closed
This started happening after I made a modification on zre-ldap005 followed by a
modification on zre-ldap003.  In that case, the following was logged on
zre-ldap002:
Feb 18 16:06:04 zre-ldap002 slapd[11266]: do_syncrep2: rid=101
cookie=rid=101,sid=002,csn=20160218220604.321502Z#000000#002#000000
Feb 18 16:06:04 zre-ldap002 slapd[11266]: slap_queue_csn: queueing 0x3589880
20160218220604.321502Z#000000#002#000000
Feb 18 16:06:04 zre-ldap002 slapd[11266]: do_syncrep2: rid=100
cookie=r%3=100,sid=001,csn=20160218220604.321502Z#000000#002#000000
Feb 18 16:06:04 zre-ldap002 slapd[11266]: slap_graduate_commit_csn: removing
0x3589880 20160218220604.321502Z#000000#002#000000
Feb 18 16:06:04 zre-ldap002 slapd[11266]: syncrepl_message_to_op: rid=101
be_modify cn=admins,cn=zimbra (0)
Feb 18 16:06:04 zre-ldap002 slapd[11266]: slap_queue_csn: queueing 0x3589740
20160218220604.321502Z#000000#002#000000
Feb 18 16:06:04 zre-ldap002 slapd[11266]: slap_graduate_commit_csn: removing
0x3589740 20160218220604.321502Z#000000#002#000000
Feb 18 16:06:04 zre-ldap002 slapd[11266]: do_syncrep2: rid=100 CSN pending,
ignoring 20160218220604.321502Z#000000#002#000000
(reqStart=20160218220604.326527Z,cn=accesslog)
So we see the write accepted from one master, and correctly rejected (not
replayed) from the other.
After this, there were no more modifications made to either master.  However, I
*did* make a modification to cn=config on the replica.  After that was done, is
when the error(53) started getting triggered:
Feb 18 16:27:12 zre-ldap002 slapd[11266]: slap_queue_csn: queueing 0x3587a40
20160218222712.155976Z#000000#000#000000
Feb 18 16:27:12 zre-ldap002 slapd[11266]: slap_graduate_commit_csn: removing
0x3587a40 20160218222712.155976Z#000000#000#000000
Feb 18 16:27:12 zre-ldap002 slapd[11266]: do_syncrep2: rid=100
LDAP_RES_SEARCH_RESULT
Feb 18 16:27:12 zre-ldap002 slapd[11266]: do_syncrep2: rid=100
LDAP_RES_SEARCH_RESULT (53) Server is unwilling to perform
Feb 18 16:27:12 zre-ldap002 slapd[11266]: do_syncrep2: rid=100 (53) Server is
unwilling to perform
Feb 18 16:27:12 zre-ldap002 slapd[11266]: do_syncrepl: rid=100 rc -2 retrying