RE: syncrepl problem?

14 Aug 2009

      Matt,
  I did a slapcat to get the ldif into a good ldif file at 1:00am and used
slapadd to restore it after we screwed up all the ldap databases with bad
entries.  The consumers had all the bad entries like the provider because of
the quick refreshandpersist mode.  Hence, all three ldap servers had the
same significantly wrong information.  I did not do a -w.  I wasn't aware of
that option.  It sounds like this was a critical step I should have done,
but it's probably too late for this particular problem since I can't start
over again now.  
  Question: for future reference, if I use slapadd with a -w, can I delete
everything in the ldap content directory except for the DB_CONFIG file?
Tim Tyler
Network Engineer
Beloit College
-----Original Message-----
From: Matthew Backes [mailto:mbackes@symas.com] 
Sent: Thursday, August 13, 2009 6:24 PM
To: openldap-technical@openldap.org
Cc: tyler@beloit.edu
Subject: Re: syncrepl problem?
...
We are running 2.3.43 Openldap on Centos 5.3 systems.  I have one  
provider and two consumers.  I believe the consumers were working  
fine in terms of receiving replication data and staying synchronized  
until today.  I have this entry in slapd.conf
Consider upgrading, but that should be unrelated.
...
syncrepl rid=102
    type=refreshAndPersist
    interval=00:01:00:00
interval= is for refreshOnly
You want retry= to specify a retry period, or else any interruption  
will halt replication.
...
The problem is that I had to completely restore the provider's  
entire ldap database from a backup ldif file after screwing up over  
200 accounts.  I got the provider back to the way I wanted, but now  
the consumers won't synchronize (replicate) any more.
Hopefully that was a backup taken with slapcat, or preserving all of  
the metadata using a careful search.  (Check for entryUUID/entryCSN)   
Worst case you can pull that from the replica.
Did you remember to slapadd on the master side with -w so that  
contextCSN exists and is up to date?
What do you see in the logs?  Does your restored database still have  
your replication account, sufficient ACLs/limits, etc in the  
configuration?  What does contextCSN look like on each side?  Do  
entryUUIDs match on objects with matching DNs?
...

  Should syncrepl ultimately be able to replicate after a  

major change to the provider such as a ldif restoration?  Or should  
I expect to have to reload the consumer entries from scratch from a  
provider generated ldif in situations like this?
If you loaded the right LDIF, (i.e. didn't generate entirely new and  
unrelated data with different uuid/csn info) then this really should  
not be a major change.
If you loaded correct but old data with a lower contextCSN than the  
contextCSN on the replica, then you will probably lose all of the  
changes still present on the replica.
I see no reason why you would want to reload the consumer.  In the  
event of catastrophic master failure like you describe (lost all  
drives in your RAID set, someone did rm -rf /, building fire, etc),  
you should use the data from the replica.  That's one of the main  
reasons for having a replica in the first place.
...

  I thought I read once that the interval settings was still  

important for when refreshandpersist missed an update.  Is that true?
No.  See retry=
Matthew Backes
Symas Corporation
mbackes@symas.com

2025

2024

2023

2022

2021

2020

2019

2018

2017

2016

2015

2014

2013

2012

2011

2010

2009

2008

2007

RE: syncrepl problem?