Disaster recovery question wrt replication

List overview All Threads
Download

newer

older

slapd: The reverse of...

Problem when using 'accesslog' and...

Steven Harms (stharms)

13 Dec 2006 13 Dec '06

9:16 p.m.

Hi all,

New LDAP implementor here.

I'm trying to document disaster recovery steps.

Assuming a single master and 3 replicas.

[Q1]: Is this an acceptable architecture? In the master's slapd.conf I define 3 replica statements, and on the 3 replica servers I use this master as the updateref.

If all the replicas fail and the master survives, I'm trying to figure out how to restore service.

1. Establish replacement replicas 2. (master) slapcat -l /resync.ldif. Copy to each replica. 3. (each replica) slapadd -l /resync.ldif

Now here's the sticky part. The slurpd.replog has entries that were destined for the replicas. Now that I've sync'd via slapadd, these are not necessary.

How can I clean out the replog back-queue to a pristine start?

I suppose, more generally, I'm asking: How do a start replication all over. What files can / should I delete. Which should I not under any circumstance touch?

Steven

Show replies by date

Quanah Gibson-Mount

14 Dec 14 Dec

12:56 a.m.

--On Wednesday, December 13, 2006 12:16 PM -0800 "Steven Harms (stharms)" stharms@cisco.com wrote:

...

Hi all,

New LDAP implementor here.

I'm trying to document disaster recovery steps.

Assuming a single master and 3 replicas.

[Q1]: Is this an acceptable architecture? In the master's slapd.conf I define 3 replica statements, and on the 3 replica servers I use this master as the updateref.

If all the replicas fail and the master survives, I'm trying to figure out how to restore service.

Establish replacement replicas

(master) slapcat -l /resync.ldif. Copy to each replica.

(each replica) slapadd -l /resync.ldif

Now here's the sticky part. The slurpd.replog has entries that were destined for the replicas. Now that I've sync'd via slapadd, these are not necessary.

How can I clean out the replog back-queue to a pristine start?

I suppose, more generally, I'm asking: How do a start replication all over. What files can / should I delete. Which should I not under any circumstance touch?

Why not just use syncrepl or delta-syncrepl? They support catch up. Slurpd will be removed in the near future, and is pretty much at an evolutionary dead end.

--Quanah

-- Quanah Gibson-Mount Principal Software Developer ITS/Shared Application Services Stanford University GnuPG Public Key: http://www.stanford.edu/~quanah/pgp.html

Aaron Richton

1:36 a.m.

Use of slurpd certainly isn't a modern architecture, especially with concern for rapid DR, for reasons including your question at hand. Look into syncrepl (make sure to upgrade to 2.3.30 first) and mirrormode.

I would argue that the way to deal with HA/DR with OpenLDAP is to install enough slaves that you feel the odds of complete failure are tolerable, and work on a good DR plan for the event of a master failure. With that said, slapcat/slapadd is the preferred server turnup technique, DR or not. With slapadd -q and sufficient RAM you can achieve reasonable (i.e. minutes) turnup times even on fairly large databases. On modern storage backends (bdb/hdb) slapcat is safe to do hot, so slapcat/slapadd will give you a fairly quick recovery with any interval between backups that you feel comfortable with. If you require greater availability than "minutes," mirrormode is probably the right idea.

I'm pretty sure that mirrormode requires syncrepl. I'm not personally sure, I've deferred considering it until 2.4.

On Wed, 13 Dec 2006, Steven Harms (stharms) wrote:

...

Hi all,

New LDAP implementor here.

I'm trying to document disaster recovery steps.

Assuming a single master and 3 replicas.

[Q1]: Is this an acceptable architecture? In the master's slapd.conf I define 3 replica statements, and on the 3 replica servers I use this master as the updateref.

If all the replicas fail and the master survives, I'm trying to figure out how to restore service.

Establish replacement replicas

(master) slapcat -l /resync.ldif. Copy to each replica.

(each replica) slapadd -l /resync.ldif

Now here's the sticky part. The slurpd.replog has entries that were destined for the replicas. Now that I've sync'd via slapadd, these are not necessary.

How can I clean out the replog back-queue to a pristine start?

I suppose, more generally, I'm asking: How do a start replication all over. What files can / should I delete. Which should I not under any circumstance touch?

Steven

Steven G. Harms

6:15 a.m.

Replies inline:

On Wed, Dec 13, 2006 at 07:36:22PM -0500, Aaron Richton wrote:

...

Use of slurpd certainly isn't a modern architecture, especially with concern for rapid DR, for reasons including your question at hand. Look into syncrepl (make sure to upgrade to 2.3.30 first) and mirrormode.

Due to reasons beyond my control, I'm currently at OpenLDAP version slapd 2.2.13 (Aug 18 2005 22:22:34). As such syncrepl, based on your versioning guideline below isn't an option.

...

I would argue that the way to deal with HA/DR with OpenLDAP is to install enough slaves that you feel the odds of complete failure are tolerable, ....

I have built replicas in different data centers etc. So I'm going for a 'multiple replica' theory of backup. The DR need not be instant, nor does it need to be perfectly synchronized.

In light of these constraints, would it be appropriate to use slurpd for replication?

Assuming slurpd for replication, what are the answers to these questions:

1. Can I assign multiple replicas

2. How can I clean out the replog back-queue to a pristine start?

3. I suppose, more generally, I'm asking: How do a start replication all over. What files can / should I delete. Which should I not under any circumstance touch?

Should i delete the slurpd.replog.lock file or/and the slurpd.replog? Should i just cat /dev/null into them? How can I start over? And further, how do i rotate the replog as it starts to eat up more filesystem.

For the record I'm using a BDB back end.

Steven

Quanah Gibson-Mount

5:10 p.m.

--On Thursday, December 14, 2006 12:15 AM -0500 "Steven G. Harms" stharms@cisco.com wrote:

...

Replies inline:

On Wed, Dec 13, 2006 at 07:36:22PM -0500, Aaron Richton wrote:

...
Use of slurpd certainly isn't a modern architecture, especially with concern for rapid DR, for reasons including your question at hand. Look into syncrepl (make sure to upgrade to 2.3.30 first) and mirrormode.

Due to reasons beyond my control, I'm currently at OpenLDAP version slapd 2.2.13 (Aug 18 2005 22:22:34). As such syncrepl, based on your versioning guideline below isn't an option.

I assume you are aware that OpenLDAP 2.2.13 is an extremely old release. 2.2.30 was the final 2.2 release. 2.2.13 has multiple security vulnerabilities present it, and large numbers of stability bugs. Mayhaps that can be reason enough to convince the folks admining it to do a very needed upgrade.

--Quanah

-- Quanah Gibson-Mount Principal Software Developer ITS/Shared Application Services Stanford University GnuPG Public Key: http://www.stanford.edu/~quanah/pgp.html

Howard Chu

5:11 p.m.

Steven G. Harms wrote:

...

Replies inline:

On Wed, Dec 13, 2006 at 07:36:22PM -0500, Aaron Richton wrote:

...
Use of slurpd certainly isn't a modern architecture, especially with concern for rapid DR, for reasons including your question at hand. Look into syncrepl (make sure to upgrade to 2.3.30 first) and mirrormode.

Due to reasons beyond my control, I'm currently at OpenLDAP version slapd 2.2.13 (Aug 18 2005 22:22:34). As such syncrepl, based on your versioning guideline below isn't an option.

Nothing good will come of this. It sounds like you're just entering deployment, with a release that's been obsolete for quite a while. (2.2.13 was released June 2004. It was superseded by 2.2.14 only 4 days later. That's hardly a good place from which to launch a new project.)

...

...
I would argue that the way to deal with HA/DR with OpenLDAP is to install enough slaves that you feel the odds of complete failure are tolerable, ....

I have built replicas in different data centers etc. So I'm going for a 'multiple replica' theory of backup. The DR need not be instant, nor does it need to be perfectly synchronized.

In light of these constraints, would it be appropriate to use slurpd for replication?

...

Assuming slurpd for replication, what are the answers to these questions:

Can I assign multiple replicas

Yes.

...

How can I clean out the replog back-queue to a pristine start?

...

I suppose, more generally, I'm asking: How do a start replication all

over. What files can / should I delete. Which should I not under any circumstance touch?

...

Should i delete the slurpd.replog.lock file or/and the slurpd.replog? Should i just cat /dev/null into them? How can I start over? And further, how do i rotate the replog as it starts to eat up more filesystem.

...

For the record I'm using a BDB back end.

If you're using multiple replicas then you can't simply delete the replog file, unless you're reinitializing all the replicas at once. The simplest approach may be to rewrite the records in the slurpd.status file with timestamps matching the snapshot they started from.

Rotating the replog should never be needed, since slurpd truncates it automatically as changes are propagated. Deleting the lock files is irrelevant, since they don't contain anything.

-- -- Howard Chu Chief Architect, Symas Corp. http://www.symas.com Director, Highland Sun http://highlandsun.com/hyc OpenLDAP Core Team http://www.openldap.org/project/

matthew sporleder

5:15 p.m.

On 12/14/06, Steven G. Harms stharms@cisco.com wrote:

...

Replies inline:

On Wed, Dec 13, 2006 at 07:36:22PM -0500, Aaron Richton wrote:

...
Use of slurpd certainly isn't a modern architecture, especially with concern for rapid DR, for reasons including your question at hand. Look into syncrepl (make sure to upgrade to 2.3.30 first) and mirrormode.

Due to reasons beyond my control, I'm currently at OpenLDAP version slapd 2.2.13 (Aug 18 2005 22:22:34). As such syncrepl, based on your versioning guideline below isn't an option.

...
I would argue that the way to deal with HA/DR with OpenLDAP is to install enough slaves that you feel the odds of complete failure are tolerable, ....

I have built replicas in different data centers etc. So I'm going for a 'multiple replica' theory of backup. The DR need not be instant, nor does it need to be perfectly synchronized.

In light of these constraints, would it be appropriate to use slurpd for replication?

Assuming slurpd for replication, what are the answers to these questions:

Can I assign multiple replicas

How can I clean out the replog back-queue to a pristine start?

I suppose, more generally, I'm asking: How do a start replication all

over. What files can / should I delete. Which should I not under any circumstance touch?

Should i delete the slurpd.replog.lock file or/and the slurpd.replog? Should i just cat /dev/null into them? How can I start over? And further, how do i rotate the replog as it starts to eat up more filesystem.

For the record I'm using a BDB back end.

While I am also planning on a switch to syncrepl, I can only make so many changes at a time, so I'm stuck in this boat with you for a while. I found at the NYC BSD CON that openldap HA is a very common questions for people, so here's a rough outline of how I told people to use slurpd for HA. (after saying how much easier syncrepl was, of course)

Run at least two instances of slapd on your master server. This allows you to build new replicas without much downtime of existing replicas or the master. You can start backlogging a new replica from a point-in-time slapcat of another downed replica.

Use clustering (like veritas on a san, bdb doesn't like nfs if I remember correctly) to manage failing over the --entire-- database to another physical server if that one craps out.

If you can't use a cluster, or your san fails, or whatever, you can also promote one of your replicas to be a master by moving over the config files and, if needed, add the IP from your previous master. When you fail over the slurpd configs, remove the old master since you will have to rebuild it from scratch. You can move the old replication logs over and run them in one-shot mode on the new master before you start it up. That should help keep you in-sync. You can also manually edit the replog to change the server ip, if needed.

Now, rebuild your master from scratch. Now shutdown and slapcat a replica using well-timed slurpd config changes to keep everything in-sync, and then follow the above procedures for promoting it back to the master.

Buchan Milne

5:18 p.m.

On Thursday 14 December 2006 07:15, Steven G. Harms wrote:

...

Replies inline:

On Wed, Dec 13, 2006 at 07:36:22PM -0500, Aaron Richton wrote:

...
Use of slurpd certainly isn't a modern architecture, especially with concern for rapid DR, for reasons including your question at hand. Look into syncrepl (make sure to upgrade to 2.3.30 first) and mirrormode.

Due to reasons beyond my control, I'm currently at OpenLDAP version slapd 2.2.13 (Aug 18 2005 22:22:34). As such syncrepl, based on your versioning guideline below isn't an option.

If you're sticking with 2.2.13 for "vendor support", judging by the version number, I'm guessing that is with a vendor that now has their own LDAP server, and they primarily use OpenLDAP for the library (to support ldap clients).

We use that vendors distro, but run our own packages (rebuilt from the Mandriva SRPMs since I maintain those myself).

My older (2.3.27) set is available here:

http://anorien.warwick.ac.uk/mirrors/buchan/openldap/

If someone needs the 2.3.30's I built a few weeks ago ... I'll update them there too.

Regards, Buchan

-- Buchan Milne ISP Systems Specialist - Monitoring/Authentication Team Leader B.Eng,RHCE(803004789010797),LPIC-2(LPI000074592)

Sam Tran

5:31 p.m.

On 12/14/06, Buchan Milne bgmilne@staff.telkomsa.net wrote:

...

We use that vendors distro, but run our own packages (rebuilt from the Mandriva SRPMs since I maintain those myself).

My older (2.3.27) set is available here:

http://anorien.warwick.ac.uk/mirrors/buchan/openldap/

If someone needs the 2.3.30's I built a few weeks ago ... I'll update them there too.

Buchan,

I am interested in the 2.3.30 packages for RHEL4 i386 and x86_64.

Thanks in advance. Sam

Buchan Milne

22 Dec 22 Dec

11:48 a.m.

On Thursday 14 December 2006 18:31, Sam Tran wrote:

...

On 12/14/06, Buchan Milne bgmilne@staff.telkomsa.net wrote:

...
We use that vendors distro, but run our own packages (rebuilt from the Mandriva SRPMs since I maintain those myself).

My older (2.3.27) set is available here:

http://anorien.warwick.ac.uk/mirrors/buchan/openldap/

If someone needs the 2.3.30's I built a few weeks ago ... I'll update them there too.

Buchan,

I am interested in the 2.3.30 packages for RHEL4 i386 and x86_64.

2.3.31 is there now.

Regards, Buchan

-- Buchan Milne ISP Systems Specialist - Monitoring/Authentication Team Leader B.Eng,RHCE(803004789010797),LPIC-2(LPI000074592)

Tony Earnshaw

14 Dec 14 Dec

6:55 p.m.

Buchan Milne wrote:

...

If you're sticking with 2.2.13 for "vendor support", judging by the version number, I'm guessing that is with a vendor that now has their own LDAP server, and they primarily use OpenLDAP for the library (to support ldap clients).

We use that vendors distro, but run our own packages (rebuilt from the Mandriva SRPMs since I maintain those myself).

My older (2.3.27) set is available here:

FWIW we're running 2.3.30 rpms (with great success) on RHAS4 with an older Buchan Milne spec file than "My older (2.3.27) set", simply by modifying the version number in the spec file.

Upgrading from 2.2.13 to 2.3.latest has to be a must for RHAS/RHEL/FC, if any RHAS/RHEL/FC admin wishes to take OpenLDAP seriously. Both bugs/limitations and functionality have been addressed between 2.2.13 to 2.3.latest.

--Tonni

-- Tonni Earnshaw tonni @ barlaeus.nl

Tony Earnshaw

9:04 a.m.

Steven Harms (stharms) wrote:

[...]

...

I suppose, more generally, I'm asking: How do a start replication all over. What files can / should I delete. Which should I not under any circumstance touch?

Others (Quanah) have mentioned dropping implementation of slurpd in favor of i(delta-)syncrepl - requires upgrading to 2.3(.30 at the moment).

This site used to use slurpd with OpenLDAP 2.2. Before we switched to 2.3, we tested syncrepl slave DB rebuilds from scratch (empty DB, valid slapd.conf and DB_CONFIG in the DB directory). We simply started slapd on the slave machines and the small (about 40MB) DB was automatically rebuilt within seconds (100Mb Ethernet and fast SCSI RAID5), high-quality iron.

Having previously had to deal with the same thing with slurpd earlier, I was completely gob-smacked.

--Tonni

-- Tonni Earnshaw tonni @ barlaeus.nl

6767

Age (days ago)

6776

Last active (days ago)

openldap-software@openldap.org

11 comments

9 participants

tags (0)

participants (9)

Aaron Richton
Buchan Milne
Howard Chu
matthew sporleder
Quanah Gibson-Mount
Sam Tran
Steven G. Harms
Steven Harms (stharms)
Tony Earnshaw