hi. i have two servers, in an mmr arrangement, using delta-syncrepl. on a couple of occasions, the servers have stopped replicating, and the following is logged:
dsa1: Jun 27 06:13:29 ldap0 slapd[8699]: do_syncrep2: rid=000 LDAP_RES_SEARCH_RESULT Jun 27 06:13:29 ldap0 slapd[8699]: do_syncrep2: rid=000 LDAP_RES_SEARCH_RESULT (53) Server is unwilling to perform Jun 27 06:13:29 ldap0 slapd[8699]: do_syncrep2: rid=000 (53) Server is unwilling to perform Jun 27 06:13:29 ldap0 slapd[8699]: do_syncrepl: rid=000 rc -2 retrying
dsa2: Jun 27 09:13:29 ldap0 slapd[14910]: conn=49263 fd=9 ACCEPT from IP=10.200.41.20:49141 (IP=0.0.0.0:389) Jun 27 09:13:29 ldap0 slapd[14910]: conn=49263 op=0 EXT oid=1.3.6.1.4.1.1466.20037 Jun 27 09:13:29 ldap0 slapd[14910]: conn=49263 op=0 STARTTLS Jun 27 09:13:29 ldap0 slapd[14910]: conn=49263 op=0 RESULT oid= err=0 text= Jun 27 09:13:29 ldap0 slapd[14910]: conn=49263 fd=9 TLS established tls_ssf=256 ssf=256 Jun 27 09:13:29 ldap0 slapd[14910]: conn=49263 op=1 BIND dn="uid=dsa1_slapd-repl-content,ou=dsa1.example.org,ou=services,ou=accounts,dc=example,dc=org" method=128 Jun 27 09:13:29 ldap0 slapd[14910]: conn=49263 op=1 BIND dn="uid=dsa1_slapd-repl-content,ou=dsa1.example.org,ou=services,ou=accounts,dc=example,dc=org" mech=SIMPLE ssf=0 Jun 27 09:13:29 ldap0 slapd[14910]: conn=49263 op=1 RESULT tag=97 err=0 text= Jun 27 09:13:29 ldap0 slapd[14910]: conn=49263 op=2 SRCH base="cn=accesslog" scope=2 deref=0 filter="(&(objectClass=auditWriteObject)(reqResult=0))" Jun 27 09:13:29 ldap0 slapd[14910]: conn=49263 op=2 SRCH attr=reqDN reqType reqMod reqNewRDN reqDeleteOldRDN reqNewSuperior entryCSN Jun 27 09:13:29 ldap0 slapd[14910]: conn=49263 op=2 SEARCH RESULT tag=101 err=53 nentries=0 text=consumer state is newer than provider! Jun 27 09:13:29 ldap0 slapd[14910]: conn=49263 op=3 UNBIND Jun 27 09:13:29 ldap0 slapd[14910]: conn=49263 fd=9 closed
if i reload data and restart replication, things work again, for a period of time, but then this happens again.
what determines "consumer state is newer than provider"? i'm also a little bit confused about this message in the context of mmr. if one has newer data than the other, i had sort of expected that the newer data would replace the old [obviously it's not that simple, so i'd like to understand what i'm missing].
lastly, how can i further troubleshoot why this happened in the first place?
i'm using 2.4.44 on freebsd, built from ports. i can provide any config details etc - i just didn't want to inundate the post with guesses on detail that might not be relevant.
thanks -ben