Re: delta-syncrepl problems with 2.4.12

12 Nov 2008

      On Tue, Nov 11, 2008 at 02:18:10PM -0800, Quanah Gibson-Mount wrote:
...
--On Tuesday, November 11, 2008 4:35 PM -0500 John Morrissey 
jwm@horde.net wrote:
...
Instead of slapcat(8)/slapadd(8)ding the old databases, we're removing
the existing databases and allowing slapd(8) to delta-syncrepl a copy
from scratch. Ironing out this use case is especially important for us
since we expect to be adding a number of consumers in the coming months
and would obviously prefer to bring them online without having to shut
down any other slapd instances for slapcat(8)ting.
Why would you have to shut down a server to slapcat it?  Hot slapcatting 
has been supported for a long time.
Right, slapcat's man page indicates it's always safe to run against the bdb
backend, but I suspect that's referring more to read concurrency and not
necessarily the generation of a consistent, point-in-time snapshot of the
database.
Empirically, slapcat output does not have a consistent view of the database
while dumping it. Specifically, when slapcat is running and one changes an
entry that hasn't been dumped yet, that change will appear in slapcat's
output. Using slapadd's -w option would definitely be unsafe in this
situation.
Without the -w option seems safe at first glance since the suffix entry's
contextCSN will be older than any CSN in the generated LDIF. It seems that
any syncrepl updates that have already been "applied" by virtue of the
aforementioned slapcat behavior will simply be skipped since there will be
no changes to the entry? Still, I couldn't find anything in the
Administrator's Guide about this, and it feels like there's some concurrency
case I'm not considering here, so I'd definitely appreciate hearing any
thoughts you have on this.
...
...
At this point, slurpd seems to start processing the accesslog.
There is no slurpd in OpenLDAP 2.4.  I think you mean syncrepl?
Yes, that was a typo.
...
...
It's interesting that two consumers have successfully delta-syncrepl'd
complete databases from scratch without experiencing this problem. At
least four other consumer machines fail in this manner. There seems to
be no rhyme or reason as to which machines succeed or fail; they're all
running the same binaries, same OS release and patches, some are even on
the same Ethernet segment as the provider. The provider slapd has been
up consistently (without crash nor restart) during at least two
attempts.
What patches?
I was referring to operating system patches; we're using the Debian
packaging of 2.4.11 (from the upcoming lenny release) that we've updated
locally for 2.4.12. Debian patches OpenLDAP fairly lightly, and none of
their patches seem to get anywhere near syncrepl or other hard slapd
internals.
...
Newer patches for 2.4.13 of interest may be:
   Fixed slapd syncrepl event loss (ITS#5710)

We don't use the ppolicy overlay.
...
   Fixed slapd syncrepl MOD of attrs with no EQ rule (ITS#5781)

AFAICT all of the attributes we're modifying have equality matching rules.
...
   Fixed slapd syncrepl schema checking (ITS#5798)

We aren't using multimaster replication, and don't care about schemachecking
on our consumers.
...
On the provider side:
   Fixed slapo-syncprov runqueue removal (ITS#5776)

Looks like this patch addresses a case where syncprov stops sending queued
responses to persistent searches, which I don't think is applicable to this
particular problem?
...
   Fixed slapo-syncprov unreplicatable ops (ITS#5709)

This might bite us in future so I'll be sure to include it locally.
However, this doesn't seem like the source of the current problem since the
symptoms are that entries that should have been pulled during the initial
refresh phase are not present for later syncrepl activity.
We haven't upgraded our provider to 2.4 yet, since we wanted to get some
consumers upgraded first. Would running a 2.4 provider with 2.3 consumers be
OK? The contextCSN format changed to add fractional seconds; will that have
any adverse impact on existing 2.3 consumers that try to continue
syncrepling?
john
-- 
John Morrissey          _o            /\         ----  __o
jwm@horde.net        _-< _          /  \       ----  <  ,
www.horde.net/    __(_)/_(_)________/    _______(_) /_(_)__

2026

2025

2024

2023

2022

2021

2020

2019

2018

2017

2016

2015

2014

2013

2012

2011

2010

2009

2008

2007

2006

Re: delta-syncrepl problems with 2.4.12