Re: Persistent failures of test050

27 Jun 2019


      On Tue, Jun 25, 2019 at 04:45:30PM -0700, Quanah Gibson-Mount wrote:
...
--On Saturday, June 22, 2019 2:06 PM -0700 Quanah Gibson-Mount
quanah@symas.com wrote:
There appears to be two separate problems happening in test050.
Problem #1) Null cookie is generated, causing catastrophic database loss
across the entire MMR cluster (they all lose all their data).  This is new
with 2.4.48, perhaps related to the revert of part of ITS#8281 when ITS#9015
was fixed (purely speculation on my part at the moment).  This appears to be
a major/significant regression.
Not sure the above is the same failure I'm seeing, so will outline mine
(reproduced on master+ITS#9043 logging):
- all servers start with nothing but replicated cn=config
- database is configured on server1 including syncprov and syncrepl, it
  replicates to others
- server2 contacts server1 to start replicating, starts present phase
- server1 contacts server2 to do the same, while server2 is still in
  present phase, somehow server2 has decided to attach its own CSNs to
  entries so it sees a 002 contextcsn and present phase finishes
  prematurely (server2 doesn't have all data yet)
- result is server1 loses a large part of its database while server2 is
  fine, and both think they're in sync
No idea yet why and when server2 generates its own CSN for (some?) of
the entries. Sounds a bit like ITS#8125 to me.
If it thought there was no CSN, things might be ok, might have to reject
new consumers while we know we're in the middle of processing an inbound
refresh (=we have modified the DB but not updated contexCSN). If we
haven't, we could send the entries as we go. That way multiple servers
might reasonably be in present phase from each other at the same time
safely?
I'll see in the meantime why the CSN was generated on server2. Might
take a while to reproduce this again though.
Regards,
-- 
Ondřej Kuzník
Senior Software Engineer
Symas Corporation                       http://www.symas.com
Packaged, certified, and supported LDAP solutions powered by OpenLDAP

2025

2024

2023

2022

2021

2020

2019

2018

2017

2016

2015

2014

2013

2012

2011

2010

2009

2008

2007

2006

Re: Persistent failures of test050