syncrepl problem?

List overview All Threads
Download

newer

older

FW: Multi-master configuration --...

Cleaning slapcat(1) LDIF output

Tim Tyler

13 Aug 2009 13 Aug '09

9:30 p.m.

Openldap experts,

We are running 2.3.43 Openldap on Centos 5.3 systems. I have one provider and two consumers. I believe the consumers were working fine in terms of receiving replication data and staying synchronized until today. I have this entry in slapd.conf

syncrepl rid=102

type=refreshAndPersist

interval=00:01:00:00

The problem is that I had to completely restore the provider's entire ldap database from a backup ldif file after screwing up over 200 accounts. I got the provider back to the way I wanted, but now the consumers won't synchronize (replicate) any more.

1. Should syncrepl ultimately be able to replicate after a major change to the provider such as a ldif restoration? Or should I expect to have to reload the consumer entries from scratch from a provider generated ldif in situations like this?

2. I thought I read once that the interval settings was still important for when refreshandpersist missed an update. Is that true?

Tim Tyler

Network Engineer

Beloit College

Attachments:

attachment.htm (text/html — 6.8 KB)

Show replies by date

Matthew Backes

14 Aug 14 Aug

1:23 a.m.

...

We are running 2.3.43 Openldap on Centos 5.3 systems. I have one provider and two consumers. I believe the consumers were working fine in terms of receiving replication data and staying synchronized until today. I have this entry in slapd.conf

Consider upgrading, but that should be unrelated.

...

syncrepl rid=102 type=refreshAndPersist interval=00:01:00:00

interval= is for refreshOnly

You want retry= to specify a retry period, or else any interruption will halt replication.

...

The problem is that I had to completely restore the provider’s entire ldap database from a backup ldif file after screwing up over 200 accounts. I got the provider back to the way I wanted, but now the consumers won’t synchronize (replicate) any more.

Hopefully that was a backup taken with slapcat, or preserving all of the metadata using a careful search. (Check for entryUUID/entryCSN) Worst case you can pull that from the replica.

Did you remember to slapadd on the master side with -w so that contextCSN exists and is up to date?

What do you see in the logs? Does your restored database still have your replication account, sufficient ACLs/limits, etc in the configuration? What does contextCSN look like on each side? Do entryUUIDs match on objects with matching DNs?

...

  Should syncrepl ultimately be able to replicate after a  
major change to the provider such as a ldif restoration? Or should I expect to have to reload the consumer entries from scratch from a provider generated ldif in situations like this?

If you loaded the right LDIF, (i.e. didn't generate entirely new and unrelated data with different uuid/csn info) then this really should not be a major change.

If you loaded correct but old data with a lower contextCSN than the contextCSN on the replica, then you will probably lose all of the changes still present on the replica.

I see no reason why you would want to reload the consumer. In the event of catastrophic master failure like you describe (lost all drives in your RAID set, someone did rm -rf /, building fire, etc), you should use the data from the replica. That's one of the main reasons for having a replica in the first place.

...

  I thought I read once that the interval settings was still  
important for when refreshandpersist missed an update. Is that true?

No. See retry=

Matthew Backes Symas Corporation mbackes@symas.com

Tim Tyler

4:18 p.m.

Matt, I did a slapcat to get the ldif into a good ldif file at 1:00am and used slapadd to restore it after we screwed up all the ldap databases with bad entries. The consumers had all the bad entries like the provider because of the quick refreshandpersist mode. Hence, all three ldap servers had the same significantly wrong information. I did not do a -w. I wasn't aware of that option. It sounds like this was a critical step I should have done, but it's probably too late for this particular problem since I can't start over again now. Question: for future reference, if I use slapadd with a -w, can I delete everything in the ldap content directory except for the DB_CONFIG file?

Tim Tyler Network Engineer Beloit College

-----Original Message----- From: Matthew Backes [mailto:mbackes@symas.com] Sent: Thursday, August 13, 2009 6:24 PM To: openldap-technical@openldap.org Cc: tyler@beloit.edu Subject: Re: syncrepl problem?

...

We are running 2.3.43 Openldap on Centos 5.3 systems. I have one provider and two consumers. I believe the consumers were working fine in terms of receiving replication data and staying synchronized until today. I have this entry in slapd.conf

Consider upgrading, but that should be unrelated.

...

syncrepl rid=102 type=refreshAndPersist interval=00:01:00:00

interval= is for refreshOnly

You want retry= to specify a retry period, or else any interruption will halt replication.

...

The problem is that I had to completely restore the provider's entire ldap database from a backup ldif file after screwing up over 200 accounts. I got the provider back to the way I wanted, but now the consumers won't synchronize (replicate) any more.

Hopefully that was a backup taken with slapcat, or preserving all of the metadata using a careful search. (Check for entryUUID/entryCSN) Worst case you can pull that from the replica.

Did you remember to slapadd on the master side with -w so that contextCSN exists and is up to date?

...

  Should syncrepl ultimately be able to replicate after a  
major change to the provider such as a ldif restoration? Or should I expect to have to reload the consumer entries from scratch from a provider generated ldif in situations like this?

If you loaded the right LDIF, (i.e. didn't generate entirely new and unrelated data with different uuid/csn info) then this really should not be a major change.

If you loaded correct but old data with a lower contextCSN than the contextCSN on the replica, then you will probably lose all of the changes still present on the replica.

...

  I thought I read once that the interval settings was still  
important for when refreshandpersist missed an update. Is that true?

No. See retry=

Matthew Backes Symas Corporation mbackes@symas.com

Jonathan Clarke

5:49 p.m.

On 14/08/2009 16:18, Tim Tyler wrote:

...

Matt, I did a slapcat to get the ldif into a good ldif file at 1:00am and used slapadd to restore it after we screwed up all the ldap databases with bad entries. The consumers had all the bad entries like the provider because of the quick refreshandpersist mode. Hence, all three ldap servers had the same significantly wrong information. I did not do a -w. I wasn't aware of that option. It sounds like this was a critical step I should have done, but it's probably too late for this particular problem since I can't start over again now.

It sounds like your consumers are using old syncrepl cookies, and consider themselves up-to-date with the provider, even though they're not. This is because you didn't use -w on the slapadd there.

You can "reset" the syncrepl cookie on the consumers by starting slapd there with -c rid=102. This should cause all entries to be re-synced.

...

Question: for future reference, if I use slapadd with a -w, can I delete everything in the ldap content directory except for the DB_CONFIG file?

This seems unrelated to using slapadd with -w. In general, if you have a complete LDIF file and want to use it to re-populate your directory, you can delete everything in the ldap content directory except for the DB_CONFIG file, then use slapadd, either with or without -w.

It should be noted that you can also place any lines you want in DB_CONFIG in slapd.conf with the dbconfig parameter, for example: dbconfig set_cachesize 0 1048576 0

This will cause a new DB_CONFIG file to be written with this data, if none exists. It can simplify backup/restore procedures.

Regards, Jonathan

...

Tim Tyler Network Engineer Beloit College

-----Original Message----- From: Matthew Backes [mailto:mbackes@symas.com] Sent: Thursday, August 13, 2009 6:24 PM To: openldap-technical@openldap.org Cc: tyler@beloit.edu Subject: Re: syncrepl problem?

...
We are running 2.3.43 Openldap on Centos 5.3 systems. I have one provider and two consumers. I believe the consumers were working fine in terms of receiving replication data and staying synchronized until today. I have this entry in slapd.conf

Consider upgrading, but that should be unrelated.

...
syncrepl rid=102 type=refreshAndPersist interval=00:01:00:00

interval= is for refreshOnly

You want retry= to specify a retry period, or else any interruption will halt replication.

...
The problem is that I had to completely restore the provider's entire ldap database from a backup ldif file after screwing up over 200 accounts. I got the provider back to the way I wanted, but now the consumers won't synchronize (replicate) any more.

Hopefully that was a backup taken with slapcat, or preserving all of the metadata using a careful search. (Check for entryUUID/entryCSN) Worst case you can pull that from the replica.

Did you remember to slapadd on the master side with -w so that contextCSN exists and is up to date?

What do you see in the logs? Does your restored database still have your replication account, sufficient ACLs/limits, etc in the configuration? What does contextCSN look like on each side? Do entryUUIDs match on objects with matching DNs?

...
  Should syncrepl ultimately be able to replicate after a
major change to the provider such as a ldif restoration? Or should I expect to have to reload the consumer entries from scratch from a provider generated ldif in situations like this?
If you loaded the right LDIF, (i.e. didn't generate entirely new and unrelated data with different uuid/csn info) then this really should not be a major change.

If you loaded correct but old data with a lower contextCSN than the contextCSN on the replica, then you will probably lose all of the changes still present on the replica.

I see no reason why you would want to reload the consumer. In the event of catastrophic master failure like you describe (lost all drives in your RAID set, someone did rm -rf /, building fire, etc), you should use the data from the replica. That's one of the main reasons for having a replica in the first place.

...
  I thought I read once that the interval settings was still
important for when refreshandpersist missed an update. Is that true?
No. See retry=

Matthew Backes Symas Corporation mbackes@symas.com

5784

Age (days ago)

5785

Last active (days ago)

openldap-technical@openldap.org

3 comments

3 participants

tags (0)

participants (3)

Jonathan Clarke
Matthew Backes
Tim Tyler