--On Thursday, August 01, 2019 4:16 PM +0000 quanah(a)openldap.org wrote:
Full_Name: Quanah Gibson-Mount
Version: 2.4.47
OS: N/A
URL:
ftp://ftp.openldap.org/incoming/
Submission from: (NULL) (47.208.128.44)
In investigating why a particular consumer went into REFRESH mode, I find
that this was triggered by one of its providers currently parsing a
sessionlog. Sessionlog parsing should *not* cause a consumer of that
server to fall back to REFRESH.
Some notes on this specific environment and situation.
This is a 4-way MMR setup. Via the use of a load balancer, only
iam-ed-ldp-p01 recieves write operations. Thus there should never be any
REFRESHes occurring anywhere, so the fact that there is is already
problematic.
Servers are p01, p02, p03 (all in the same data center) and b01 (offsite DR
system).
At the time this issue hit:
p01 was upgraded to a new release of OpenLDAP @ 21:26:48, thus we see:
Jul 30 21:25:17 iam-ed-ldp-p01 slapd[30495]: slapd shutdown: waiting for 0
operations/tasks to finish
Jul 30 21:26:48 iam-ed-ldp-p01 slapd[20978]: @(#) $OpenLDAP: slapd 2.4.47
(Mar 22 2019 09:12:20)
$#012#011build@rhel7build.rb.symas.net:/home/build/sold-2.4.47.3/redhat7-x86_64/build/openldap.x86_64/servers/slapd
Jul 30 21:26:48 iam-ed-ldp-p01 slapd[20979]: slapd starting
Jul 30 21:26:48 iam-ed-ldp-p01 slapd[20979]: do_syncrep2: rid=002
LDAP_RES_INTERMEDIATE - REFRESH_DELETE
Jul 30 21:26:48 iam-ed-ldp-p01 slapd[20979]: do_syncrep2: rid=001
LDAP_RES_SEARCH_RESULT
Jul 30 21:26:48 iam-ed-ldp-p01 slapd[20979]: do_syncrep2: rid=001
delta-sync lost sync, switching to REFRESH
Jul 30 21:26:48 iam-ed-ldp-p01 slapd[20979]: do_syncrep2: rid=001 (4096)
Content Sync Refresh Required
Jul 30 21:26:48 iam-ed-ldp-p01 slapd[20979]: do_syncrep2: rid=003
LDAP_RES_INTERMEDIATE - REFRESH_DELETE
p03 was operating w/o issue. Interstingly enough, when it was restarted
for the software update @ 21:37:16, it got an error (53) talking to p02
instead of an error (4096) like p01 did. This is likely because p02 was
restarted for its software update @ 21:32:19, which wiped out its
sessionlog.
p02 had started doing session log parsing @ 21:26:17 with
Jul 30 21:26:17 iam-ed-ldp-p02 slapd[3750]: conn=18205759 op=1 SEARCH
RESULT tag=101 err=4096 duration=18.462ms nentries=0 text=sync cookie is
stale
We see the connection to p02 from p01 with a similar issue when it starts
up:
Jul 30 21:26:48 iam-ed-ldp-p02 slapd[3750]: conn=18205790 op=1 SRCH
base="cn=accesslog" scope=2 deref=0
filter="(&(objectClass=auditWriteObject)(reqResult=0))"
Jul 30 21:26:48 iam-ed-ldp-p02 slapd[3750]: conn=18205790 op=1 SRCH
attr=reqDN reqType reqMod reqNewRDN reqDeleteOldRDN reqNewSuperior entryCSN
Jul 30 21:26:48 iam-ed-ldp-p02 slapd[3750]: conn=18205790 op=1 SEARCH
RESULT tag=101 err=4096 duration=0.431ms nentries=0 text=sync cookie is
stale
b01 had its software update @ 21:21:25
Generally it seems like restarting/wiping the sessionlog made things
incredibly bad across the cluster.
--
Quanah Gibson-Mount
Product Architect
Symas Corporation
Packaged, certified, and supported LDAP solutions powered by OpenLDAP:
<
http://www.symas.com>