hi. i have two servers, in an mmr arrangement, using delta-syncrepl.
on a couple of occasions, the servers have stopped replicating, and the
following is logged:
dsa1:
Jun 27 06:13:29 ldap0 slapd[8699]: do_syncrep2: rid=000
LDAP_RES_SEARCH_RESULT
Jun 27 06:13:29 ldap0 slapd[8699]: do_syncrep2: rid=000
LDAP_RES_SEARCH_RESULT (53) Server is unwilling to perform
Jun 27 06:13:29 ldap0 slapd[8699]: do_syncrep2: rid=000 (53) Server is
unwilling to perform
Jun 27 06:13:29 ldap0 slapd[8699]: do_syncrepl: rid=000 rc -2 retrying
dsa2:
Jun 27 09:13:29 ldap0 slapd[14910]: conn=49263 fd=9 ACCEPT from
IP=10.200.41.20:49141 (IP=0.0.0.0:389)
Jun 27 09:13:29 ldap0 slapd[14910]: conn=49263 op=0 EXT
oid=1.3.6.1.4.1.1466.20037
Jun 27 09:13:29 ldap0 slapd[14910]: conn=49263 op=0 STARTTLS
Jun 27 09:13:29 ldap0 slapd[14910]: conn=49263 op=0 RESULT oid= err=0 text=
Jun 27 09:13:29 ldap0 slapd[14910]: conn=49263 fd=9 TLS established
tls_ssf=256 ssf=256
Jun 27 09:13:29 ldap0 slapd[14910]: conn=49263 op=1 BIND
dn="uid=dsa1_slapd-repl-content,ou=dsa1.example.org,ou=services,ou=accounts,dc=example,dc=org"
method=128
Jun 27 09:13:29 ldap0 slapd[14910]: conn=49263 op=1 BIND
dn="uid=dsa1_slapd-repl-content,ou=dsa1.example.org,ou=services,ou=accounts,dc=example,dc=org"
mech=SIMPLE ssf=0
Jun 27 09:13:29 ldap0 slapd[14910]: conn=49263 op=1 RESULT tag=97 err=0
text=
Jun 27 09:13:29 ldap0 slapd[14910]: conn=49263 op=2 SRCH
base="cn=accesslog" scope=2 deref=0
filter="(&(objectClass=auditWriteObject)(reqResult=0))"
Jun 27 09:13:29 ldap0 slapd[14910]: conn=49263 op=2 SRCH attr=reqDN
reqType reqMod reqNewRDN reqDeleteOldRDN reqNewSuperior entryCSN
Jun 27 09:13:29 ldap0 slapd[14910]: conn=49263 op=2 SEARCH RESULT
tag=101 err=53 nentries=0 text=consumer state is newer than provider!
Jun 27 09:13:29 ldap0 slapd[14910]: conn=49263 op=3 UNBIND
Jun 27 09:13:29 ldap0 slapd[14910]: conn=49263 fd=9 closed
if i reload data and restart replication, things work again, for a
period of time, but then this happens again.
what determines "consumer state is newer than provider"? i'm also a
little bit confused about this message in the context of mmr. if one
has newer data than the other, i had sort of expected that the newer
data would replace the old [obviously it's not that simple, so i'd like
to understand what i'm missing].
lastly, how can i further troubleshoot why this happened in the first place?
i'm using 2.4.44 on freebsd, built from ports. i can provide any config
details etc - i just didn't want to inundate the post with guesses on
detail that might not be relevant.
thanks
-ben