I'm curious if the tactics described in this thread are currently sufficient:
http://www.openldap.org/lists/openldap-software/200608/msg00152.html
The thread overall suggests the tried-and-true tactic of using slapcat to extract and LDIF file, to be imported later. But, our application's DB if large enough that reimportation is prohibitive.
We're using OpenLDAP 2.3.43 under CentOS 5.7.
What we're doing currently is:
- stopping slapd - using db_checkpoint and db_archive to manage the BDB logs - copy away the directory - restart slapd
This results in a window of time during which the LDAP server is not available.
My hope was that my managing the olcReadOnly attribute via the config database (or as that cited message in the thread suggests, use the monitor database), we could perform those middle two steps while leaving a RO server in place.
Is that feasible? Recommended?
On Tue, 7 Feb 2012, Brian Reichert wrote:
The thread overall suggests the tried-and-true tactic of using slapcat to extract and LDIF file, to be imported later. But, our application's DB if large enough that reimportation is prohibitive.
First off, I'd try the latest-greatest RE24 with the latest-greatest back-mdb, and see if your slapadd performance still doesn't meet expectations. Things have come a long way from 2.3.43. A traditional dump-and-restore may not be the most fascinating option, but it's really easy to understand and it's really reliable...which may make it the best choice.
Anyway, if you're operating at this sort of insanely large scale, I'm going to enter in an assumption that you have lots of hardware, sites, etc. to go around. If that's the case, I'd take a look at the active-standby mirror mode (see OpenLDAP 2.4 Administrator's Guide, picture under section 18.3.4.1.1). That'll handle two sites that should be ready to go at any time.
If that's not paranoid enough, take n of your replica pool, and place it in additional sites (with or without live load, your call). If, let's just say, Site A is confirmed dead for a while, you could reconfigure that replica as Site A'. (In theory, at least. Don't ask me, I've never tried it, I don't have enough servers/glass/load balancers to make this happen.) If that works, you should be able to keep roaming around different data centers forever.
(Of course, you probably have other services too. So whether or not it's worth doing this with OpenLDAP Software, or if you should be looking at a lower level like virtualization, would be an interesting question in my mind...)
On Tuesday, 7 February 2012 23:53:52 Brian Reichert wrote:
I'm curious if the tactics described in this thread are currently sufficient:
http://www.openldap.org/lists/openldap-software/200608/msg00152.html
The thread overall suggests the tried-and-true tactic of using slapcat to extract and LDIF file, to be imported later. But, our application's DB if large enough that reimportation is prohibitive.
We're using OpenLDAP 2.3.43 under CentOS 5.7.
What we're doing currently is:
- stopping slapd
Why?
- using db_checkpoint and db_archive to manage the BDB logs
- copy away the directory
- restart slapd
This results in a window of time during which the LDAP server is not available.
My hope was that my managing the olcReadOnly attribute via the config database (or as that cited message in the thread suggests, use the monitor database), we could perform those middle two steps while leaving a RO server in place.
In my environment, write downtime *is* downtime.
My approach has been to follow the Berkeley DB recommendations for backing up the database and archive logs.
While we have never had to actually restore from backup, all testing that I did in the past worked reliably.
My implementation is shipped in my openldap packages (with symlinks in cron.* enabled for daily backups by default), you can find the scripts here: http://svnweb.mageia.org/packages/cauldron/openldap/current/SOURCES/ldap-hot... db-backup?view=log http://svnweb.mageia.org/packages/cauldron/openldap/current/SOURCES/ldap- common?view=log
Regards, Buchan
On Wed, Feb 08, 2012 at 12:55:34PM +0200, Buchan Milne wrote:
On Tuesday, 7 February 2012 23:53:52 Brian Reichert wrote:
I'm curious if the tactics described in this thread are currently sufficient:
http://www.openldap.org/lists/openldap-software/200608/msg00152.html
[snip]
What we're doing currently is:
- stopping slapd
Why?
Outmoded behavior; we used to (eons ago) use the ldbm backend, and when we initially migrated to bdb (also eons ago), we wanted to make sure there were no incoming transactions when we made our backup.
In my environment, write downtime *is* downtime.
It's bad for us too, that's why I'm exploring correct tactics. Hence, my original question.
My approach has been to follow the Berkeley DB recommendations for backing up the database and archive logs.
While we have never had to actually restore from backup, all testing that I did in the past worked reliably.
My implementation is shipped in my openldap packages (with symlinks in cron.* enabled for daily backups by default), you can find the scripts here:
Cool; let me review all of this...
http://svnweb.mageia.org/packages/cauldron/openldap/current/SOURCES/ldap-hot... db-backup?view=log http://svnweb.mageia.org/packages/cauldron/openldap/current/SOURCES/ldap- common?view=log
Before I dive deep into your tactics, did you or anyone else have an answer for my original question?
Regards, Buchan
On Wed, Feb 08, 2012 at 12:55:34PM +0200, Buchan Milne wrote:
My implementation is shipped in my openldap packages (with symlinks in cron.* enabled for daily backups by default), you can find the scripts here: http://svnweb.mageia.org/packages/cauldron/openldap/current/SOURCES/ldap-hot... db-backup?view=log http://svnweb.mageia.org/packages/cauldron/openldap/current/SOURCES/ldap- common?view=log
FWIW: these scripts call out a bunch of reference URLs, that Oracle has now broken:
http://www.sleepycat.com/docs/ref/transapp/archival.html http://www.sleepycat.com/docs/ref/transapp/recovery.html http://www.sleepycat.com/docs/ref/transapp/logfile.html http://www.sleepycat.com/docs/ref/transapp/hotfail.html
Regards, Buchan
On Thursday, 9 February 2012 21:00:36 Brian Reichert wrote:
On Wed, Feb 08, 2012 at 12:55:34PM +0200, Buchan Milne wrote:
My implementation is shipped in my openldap packages (with symlinks in cron.* enabled for daily backups by default), you can find the scripts here: http://svnweb.mageia.org/packages/cauldron/openldap/current/SOURCES/ldap -hot- db-backup?view=log http://svnweb.mageia.org/packages/cauldron/openldap/current/SOURCES/ldap- common?view=log
FWIW: these scripts call out a bunch of reference URLs, that Oracle has now broken:
http://www.sleepycat.com/docs/ref/transapp/archival.html http://www.sleepycat.com/docs/ref/transapp/recovery.html http://www.sleepycat.com/docs/ref/transapp/logfile.html http://www.sleepycat.com/docs/ref/transapp/hotfail.html
Yes, I know, I looked briefly for them while writing my reply above, but I could not find their new homes in the few minutes I looked.
Regards, Buchan
On Fri, Feb 10, 2012 at 09:35:12AM +0200, Buchan Milne wrote:
On Wed, Feb 08, 2012 at 12:55:34PM +0200, Buchan Milne wrote:
On Thursday, 9 February 2012 21:00:36 Brian Reichert wrote: FWIW: these scripts call out a bunch of reference URLs, that Oracle has now broken:
http://www.sleepycat.com/docs/ref/transapp/archival.html http://www.sleepycat.com/docs/ref/transapp/recovery.html http://www.sleepycat.com/docs/ref/transapp/logfile.html http://www.sleepycat.com/docs/ref/transapp/hotfail.html
Yes, I know, I looked briefly for them while writing my reply above, but I could not find their new homes in the few minutes I looked.
I found this page; HTH.
http://docs.oracle.com/cd/E17076_02/html/api_reference/C/utilities.html
Regards, Buchan
On Tue, Feb 07, 2012 at 04:53:52PM -0500, Brian Reichert wrote:
I'm curious if the tactics described in this thread are currently sufficient:
http://www.openldap.org/lists/openldap-software/200608/msg00152.html
Let me try asking a slightly different question.
This page says:
http://www.openldap.org/faq/data/cache/287.html
How do I backup my directory?
... You can backup your BDB, HDB, or LDBM-based directory by making a copy of the id2entry index file while slapd is NOT running.
Question: is slapd, put into read-only mode via the config database, close enough to 'NOT running' to satisfy the condition?
Follow-up: is it expected that the misc db_* utilities can be used safely at this point?
--On Thursday, February 09, 2012 2:21 PM -0500 Brian Reichert reichert@numachi.com wrote:
Follow-up: is it expected that the misc db_* utilities can be used safely at this point?
The only officially supported backup method with OpenLDAP is slapcat. Everything else, you do at your own risk. Personally, I think that article is wrong, as I believe you would at a minimum require id2entry, dn2id, and all of the current BDB log files. I also would not trust any FAQ entry that references LDBM, as that indicates that it is of significant age.
--Quanah
--
Quanah Gibson-Mount Sr. Member of Technical Staff Zimbra, Inc A Division of VMware, Inc. -------------------- Zimbra :: the leader in open source messaging and collaboration
On Thu, 09 Feb 2012 12:54:27 -0800, Quanah Gibson-Mount quanah@zimbra.com wrote:
The only officially supported backup method with OpenLDAP is slapcat. Everything else, you do at your own risk.
The admin guide disagrees with you. Chapter 19 describes incremental backup by copying first the entire DB, then backing up further DB logs.
Personally, I think that article is wrong, as I believe you would at a minimum require id2entry, dn2id, and all of the current BDB log files. I also would not trust any FAQ entry that references LDBM, as that indicates that it is of significant age.
Agreed. Copying anything without the logs sounds broken.
--On Thursday, February 09, 2012 11:12 PM +0100 Hallvard B Furuseth h.b.furuseth@usit.uio.no wrote:
On Thu, 09 Feb 2012 12:54:27 -0800, Quanah Gibson-Mount quanah@zimbra.com wrote:
The only officially supported backup method with OpenLDAP is slapcat. Everything else, you do at your own risk.
The admin guide disagrees with you. Chapter 19 describes incremental backup by copying first the entire DB, then backing up further DB logs.
I can't speak to why that was added to the Admin guide. The fact remains, last I checked, slapcat is the only officially supported backup method.
And those BDB tools will get you nowhere with back-mdb, back-ndb, etc, so there's a good reason that slapcat is the officially supported method.
--Quanah
--
Quanah Gibson-Mount Sr. Member of Technical Staff Zimbra, Inc A Division of VMware, Inc. -------------------- Zimbra :: the leader in open source messaging and collaboration
Hallvard B Furuseth wrote:
On Thu, 09 Feb 2012 12:54:27 -0800, Quanah Gibson-Mount quanah@zimbra.com wrote:
The only officially supported backup method with OpenLDAP is slapcat. Everything else, you do at your own risk.
The admin guide disagrees with you. Chapter 19 describes incremental backup by copying first the entire DB, then backing up further DB logs.
Chapter 19 is obviously a work-in-progress, transferred over from the FAQ-o-Matic. I'd say that anything referencing BerkeleyDB in there should be deleted, since it is specific to BerkeleyDB and not specific to OpenLDAP. (And most likely, in future releases, BerkeleyDB will disappear anyway.)
On Thu, Feb 09, 2012 at 02:36:20PM -0800, Howard Chu wrote:
Hallvard B Furuseth wrote:
On Thu, 09 Feb 2012 12:54:27 -0800, Quanah Gibson-Mount quanah@zimbra.com wrote:
The only officially supported backup method with OpenLDAP is slapcat. Everything else, you do at your own risk.
The admin guide disagrees with you. Chapter 19 describes incremental backup by copying first the entire DB, then backing up further DB logs.
I do know that using the db_* utilities are only applicable to the BDB backend. As far as I know, it's the most mature of the backends to use with 2.3.43. (If I'm wrong in that, do let me know.)
I recognize, also, that essentially all of the files in the directory are neccessary for safe backups. (I haven't explored any of suggested hot-backup techniques yet, and LDIF imports are currently too time-consuming, so I'm for now stuck with the low-tech backup method of 'tar'. :)
A core element of my question is: is putting slapd into read-only mode via the config database sufficient for me to process that directory's contents?
-- -- Howard Chu CTO, Symas Corp. http://www.symas.com Director, Highland Sun http://highlandsun.com/hyc/ Chief Architect, OpenLDAP http://www.openldap.org/project/
--On Thursday, February 09, 2012 4:35 PM -0500 Brian Reichert reichert@numachi.com wrote:
I do know that using the db_* utilities are only applicable to the BDB backend. As far as I know, it's the most mature of the backends to use with 2.3.43. (If I'm wrong in that, do let me know.)
2.3 is not a supported release series. I would strongly advise upgrading to a supported release. But yes, back-hdb/bdb are the two mature backends for use in the 2.3 series. I would only ever consider a "safe" backup of bdb itself to be when slapd is shut down, and after db_recover has been run. Then you can safely back up the *.bdb and log.* files. Puting slapd in read-only mode is not necessarily sufficient, as you need to force a BDB checkpoint prior to backing up the BDB db.
--Quanah
--
Quanah Gibson-Mount Sr. Member of Technical Staff Zimbra, Inc A Division of VMware, Inc. -------------------- Zimbra :: the leader in open source messaging and collaboration
On Thu, Feb 09, 2012 at 03:09:41PM -0800, Quanah Gibson-Mount wrote:
2.3 is not a supported release series. I would strongly advise upgrading to a supported release.
Having tracked this project for years, I'm well aware of that stance, but I'm trapped in a world where I'm stuck with what the vendor provides, warts and all.
But yes, back-hdb/bdb are the two mature backends for use in the 2.3 series.
Cool! I've been reading about the progress on HDB; hopefully I can carve out some time to shake it down...
I would only ever consider a "safe" backup of bdb itself to be when slapd is shut down, and after db_recover has been run. Then you can safely back up the *.bdb and log.* files. Puting slapd in read-only mode is not necessarily sufficient, as you need to force a BDB checkpoint prior to backing up the BDB db.
I'm familiar with forcing a checkpoint; from my first post in this thread:
What we're doing currently is:
- stopping slapd
- using db_checkpoint and db_archive to manage the BDB logs
- copy away the directory
- restart slapd
I'm trying to estabish if read-only mode is close enough to _stopping_ slapd, to allow that bdb-specific processing to safely commence...
--Quanah
--
Quanah Gibson-Mount Sr. Member of Technical Staff Zimbra, Inc A Division of VMware, Inc.
Zimbra :: the leader in open source messaging and collaboration
--On Thursday, February 09, 2012 5:17 PM -0500 Brian Reichert reichert@numachi.com wrote:
What we're doing currently is:
- stopping slapd
- using db_checkpoint and db_archive to manage the BDB logs
- copy away the directory
- restart slapd
I'm trying to estabish if read-only mode is close enough to _stopping_ slapd, to allow that bdb-specific processing to safely commence...
I thought I was very clear on that in my last email. It is not sufficient. You need to stop slapd and run *db_recover*, which is more exhaustive than db_checkpoint, if you want to go the route of backing up the BDB db.
--Quanah
--
Quanah Gibson-Mount Sr. Member of Technical Staff Zimbra, Inc A Division of VMware, Inc. -------------------- Zimbra :: the leader in open source messaging and collaboration
On Thu, Feb 09, 2012 at 03:48:45PM -0800, Quanah Gibson-Mount wrote:
I thought I was very clear on that in my last email. It is not sufficient. You need to stop slapd and run *db_recover*, which is more exhaustive than db_checkpoint, if you want to go the route of backing up the BDB db.
I'm sorry; I thought you were focussing on my processing of the directory's contents (which do need review, thanks), rather than how I prepared slapd.
Ok. Now I now, read-only mode via the config database is not sufficient for making backups in this manner.
Thanks for your patience...
--Quanah
--
Quanah Gibson-Mount Sr. Member of Technical Staff Zimbra, Inc A Division of VMware, Inc.
Zimbra :: the leader in open source messaging and collaboration
--On Thursday, February 09, 2012 5:30 PM -0500 Brian Reichert reichert@numachi.com wrote:
I'm sorry; I thought you were focussing on my processing of the directory's contents (which do need review, thanks), rather than how I prepared slapd.
Ok. Now I now, read-only mode via the config database is not sufficient for making backups in this manner.
Thanks for your patience...
:)
Also recall that going from 32 to 64-bit architectures may not work. Whomever wrote the idiotic policy of using vendor distributions needs to be educated. I only got 2.3.43 stable for me after applying a number of patches to it that I manually backported from RE24, and I have long since abandoned it for 2.4.
--Quanah
--
Quanah Gibson-Mount Sr. Member of Technical Staff Zimbra, Inc A Division of VMware, Inc. -------------------- Zimbra :: the leader in open source messaging and collaboration
On Friday, 10 February 2012 01:48:45 Quanah Gibson-Mount wrote:
--On Thursday, February 09, 2012 5:17 PM -0500 Brian Reichert
reichert@numachi.com wrote:
What we're doing currently is:
- stopping slapd
- using db_checkpoint and db_archive to manage the BDB logs
- copy away the directory
- restart slapd
I'm trying to estabish if read-only mode is close enough to _stopping_ slapd, to allow that bdb-specific processing to safely commence...
I thought I was very clear on that in my last email. It is not sufficient. You need to stop slapd and run *db_recover*, which is more exhaustive than db_checkpoint, if you want to go the route of backing up the BDB db.
If you checkpoint, and you backup all the database files (including transaction log files) in the correct order, you should not need to db_recover (as database recovery can occur at a later time, if it could not, then your backup is totally broken anyway).
Regards, Buchan
--On Friday, February 10, 2012 5:05 PM +0200 Buchan Milne bgmilne@staff.telkomsa.net wrote:
If you checkpoint, and you backup all the database files (including transaction log files) in the correct order, you should not need to db_recover (as database recovery can occur at a later time, if it could not, then your backup is totally broken anyway).
db_recover does much more than db_checkpoint. If you want to trust db_checkpoint, fine. I don't. In any case, all of this will soon be moot with back-mdb, thank god.
--Quanah
--
Quanah Gibson-Mount Sr. Member of Technical Staff Zimbra, Inc A Division of VMware, Inc. -------------------- Zimbra :: the leader in open source messaging and collaboration
On Fri, 10 Feb 2012, Buchan Milne wrote:
On Friday, 10 February 2012 01:48:45 Quanah Gibson-Mount wrote:
...
I thought I was very clear on that in my last email. It is not sufficient. You need to stop slapd and run *db_recover*, which is more exhaustive than db_checkpoint, if you want to go the route of backing up the BDB db.
If you checkpoint, and you backup all the database files (including transaction log files) in the correct order, you should not need to db_recover
If that's all that you require at backup time, then in order to guarantee correctness *at restore time* you have to perform "catastrophic" recovery (ala db_recover -c) on the restored database before trying to use it. That's necessary if a checkpoint occurs between when you start copying .db files and when you copy the last transaction log file.
The optimized procedure that I worked out with Sleepycat's help (for a completely different program, but using the "transaction data store") was this:
** Backing up the database environment is done with the following ** steps: ** 0) all txn log files except the current one are copied to ** the backup ** 1) a checkpoint is taken ** 2) the list of txn log files that are no longer needed for ** recovery or txn_abort is obtained ** 3) the LSN of the most recent checkpoint is noted ** 4) all the database table files, including queue extents, ** are copied to the backup ** 5) all the txn log files that were not copied in step (0) ** are copied to the backup ** 6) if a checkpoint has *not* taken place since step (3), ** then the database is marked as not needing catastrophic ** recovery when restored ** 7) if the list from step (2) is not empty, then those txn ** log files are removed from the active database environment ** and are marked in the backup as unnecessary for normal ** restoration ** ** Note that the ordering of this is almost completely inflexible. ** In particular: ** (0) must preceed (5) ** (1) must preceed (2) and (3) ** (2) and (3) must preceed (4) ** (4) must preceed (5) ** (5) must preceed (6) and (7) ** ** Minimizing the time between (3) and (6) is highly desirable, ** as that minimizes the window in which a checkpoint could ** occur that would result in a backup that would require ** catastrophic recovery when restored. Restoring such a ** backup is *much* slower than restoring one that only requires ** normal recovery. That's why (0) and (7) are pushed forward ** and backward to where they are.
For those trying to script this, you can get the LSN of the most recent checkpoint with db_stat -t | awk '$2 ~ /^File/offset/{print $1; exit}'
Philip Guenther
On Fri, Feb 10, 2012 at 12:00:29PM -0800, Philip Guenther wrote:
** Note that the ordering of this is almost completely inflexible. ** In particular: ** (0) must preceed (5) ** (1) must preceed (2) and (3) ** (2) and (3) must preceed (4) ** (4) must preceed (5) ** (5) must preceed (6) and (7) ** ** Minimizing the time between (3) and (6) is highly desirable, ** as that minimizes the window in which a checkpoint could ** occur that would result in a backup that would require ** catastrophic recovery when restored.
Egads. I badly want to express that as a makefile. 'make -j -f slapdbackup.mk' so you can get your ordering constraints, and yet take advantage of whatever parallel tasks you can. :)
Is management of a HDB backend's directory any easier?
Philip Guenther
On Fri, 10 Feb 2012, Brian Reichert wrote:
On Fri, Feb 10, 2012 at 12:00:29PM -0800, Philip Guenther wrote:
** Note that the ordering of this is almost completely inflexible. ** In particular: ** (0) must preceed (5) ** (1) must preceed (2) and (3) ** (2) and (3) must preceed (4) ** (4) must preceed (5) ** (5) must preceed (6) and (7) ** ** Minimizing the time between (3) and (6) is highly desirable, ** as that minimizes the window in which a checkpoint could ** occur that would result in a backup that would require ** catastrophic recovery when restored.
Egads. I badly want to express that as a makefile. 'make -j -f slapdbackup.mk' so you can get your ordering constraints, and yet take advantage of whatever parallel tasks you can. :)
These are not dynamic constraints; I took a chunk of care and got the above right, checked it with Sleepycat, then documented the reasons and have not seen any reason to touch it again. My recommendation is not spend time putting lipstick on a pig.
Is management of a HDB backend's directory any easier?
HDB is no different from BDB in this. HDB is 'just' a change to the stuff *inside* the DB, not to the DB itself.
MDB, the new thing that Howard's been working on, is completely different and doesn't need any of this.
Philip
--On Friday, February 10, 2012 3:32 PM -0500 Brian Reichert reichert@numachi.com wrote:
Is management of a HDB backend's directory any easier?
back-hdb and back-bdb both use BDB as their data store. There is no difference in how to handle backups between them.
--Quanah
--
Quanah Gibson-Mount Sr. Member of Technical Staff Zimbra, Inc A Division of VMware, Inc. -------------------- Zimbra :: the leader in open source messaging and collaboration
On Fri, Feb 10, 2012 at 03:02:51PM -0800, Quanah Gibson-Mount wrote:
back-hdb and back-bdb both use BDB as their data store. There is no difference in how to handle backups between them.
OK, thanks for the clarification...
--Quanah
--
Quanah Gibson-Mount Sr. Member of Technical Staff Zimbra, Inc A Division of VMware, Inc.
Zimbra :: the leader in open source messaging and collaboration
On Fri, Feb 10, 2012 at 12:00:29PM -0800, Philip Guenther wrote:
The optimized procedure that I worked out with Sleepycat's help (for a completely different program, but using the "transaction data store") was this:
I'm exploring implementing these steps, but I'm running into some confusion.
If you're willing to discuss this in any better detail (or even better, provide some script that encapsulates these steps), that'd be great.
For example:
- In step 3, I'm to 'take note' of the LSN (log sequence number). In my test environment, your awk scripts yields '1/456298', as a matter of example. But, I don't see where in these steps, as an implementer, that I ever actually act on this data.
Am I, in step 6, re-deriving the LSN, to see if they differ?
- In step 6, you say 'the database is marked [...]'. How do I mark the database? Or is this effect implicitly handled by the backend as it manages checkpoints and transaction logs?
- In step 7, you say 'log files ... are marked in the backup [...]'. What sort of mark do I make so they are not involved in a normal backup, but are available if a catastrophic backup if necessary? (Does this play into the how the database is marked in step 6?)
** Backing up the database environment is done with the following ** steps: ** 0) all txn log files except the current one are copied to ** the backup ** 1) a checkpoint is taken ** 2) the list of txn log files that are no longer needed for ** recovery or txn_abort is obtained ** 3) the LSN of the most recent checkpoint is noted ** 4) all the database table files, including queue extents, ** are copied to the backup ** 5) all the txn log files that were not copied in step (0) ** are copied to the backup ** 6) if a checkpoint has *not* taken place since step (3), ** then the database is marked as not needing catastrophic ** recovery when restored ** 7) if the list from step (2) is not empty, then those txn ** log files are removed from the active database environment ** and are marked in the backup as unnecessary for normal ** restoration
[...]
For those trying to script this, you can get the LSN of the most recent checkpoint with db_stat -t | awk '$2 ~ /^File/offset/{print $1; exit}'
Philip Guenther
On Mon, 13 Feb 2012, Brian Reichert wrote:
On Fri, Feb 10, 2012 at 12:00:29PM -0800, Philip Guenther wrote:
The optimized procedure that I worked out with Sleepycat's help (for a completely different program, but using the "transaction data store") was this:
I'm exploring implementing these steps, but I'm running into some confusion.
If you're willing to discuss this in any better detail (or even better, provide some script that encapsulates these steps), that'd be great.
For example:
In step 3, I'm to 'take note' of the LSN (log sequence number). In my test environment, your awk scripts yields '1/456298', as a matter of example. But, I don't see where in these steps, as an implementer, that I ever actually act on this data.
Am I, in step 6, re-deriving the LSN, to see if they differ?
Yep. If there's been a checkpoint between (3) and (6), then that value in the db_stat output will have changed.
- In step 6, you say 'the database is marked [...]'. How do I mark the database? Or is this effect implicitly handled by the backend as it manages checkpoints and transaction logs?
Hmm, I should have written "the *backup* is marked". Basically, the idea is that if such a checkpoint has occurred, then if/when the backup is restored you have to run "db_recover -c". If that didn't happen, then you don't need to do that when the backup is restored. So, you need to define some way to communicate between the backup process and the restore process. I found the easiest way to do this was to have the backup script create a file named "_quiescent" in the backup if-and-only-if a checkpoint did *not* occur. The restore script would then check for the presence of that file and if it wasn't present, it would perform catastrophic recovery.
- In step 7, you say 'log files ... are marked in the backup [...]'. What sort of mark do I make so they are not involved in a normal backup, but are available if a catastrophic backup if necessary? (Does this play into the how the database is marked in step 6?)
It's a similar idea, though in this case you should think about whether you'll *ever* want to use those log files. Under normal circumstances (restoring a known-good backup), you would not do so. The time you would do so is when some software or hardware failure has resulted in some corruption to a .db file such that the checkpoint state was inconsistent.
I've seen it happen, where *some* failure resulted in an fsync() claiming to succeed but then the data was lost. The OS (Solaris, in this case), detected it as a parity failure (IIRC) on some hardware bus and rebooted. Doing catastrophic recovery let us recover the database by rerunning the transaction log across the failure, thus fixing up the .db file. In better than a decade of working with Sleepycat DBs, that's the *only* time I've seen a need for intentional catastrophic recovery.
Note that if you don't detect the failure soon enough, you can quickly reach the state where running catastrophic recovery *would* fix the problem...but you can't afford the downtime to do it and can reach a working system *much* faster by rebuilding the database from outside sources (the most recent LDIF, etc).
This is expecially true with something like LDAP, where the *first* response to "database failure on the master!" should be "so the first replica took over, right? Why did you wake me?!".
For such a system, there's no benefit to keeping the logs listed in step (2). Just remove them from both the active environment and the backup in step (7). (Yes, yes, you can avoid copying them by moving step (0) to after steps (1) and (2), and then only copying files that are *not* in the list from the original step (2). Whatever; just *test* your procedure!)
Philip Guenther
On Thu, 09 Feb 2012 14:36:20 -0800, Howard Chu hyc@symas.com wrote:
Hallvard B Furuseth wrote:
The only officially supported backup method with OpenLDAP is slapcat. Everything else, you do at your own risk.
The admin guide disagrees with you. Chapter 19 describes incremental backup by copying first the entire DB, then backing up further DB logs.
Chapter 19 is obviously a work-in-progress, transferred over from the FAQ-o-Matic.
Presumably because backup has previously only been described in the FAQ-o-Matic. But I'm pretty sure this has been the documented backup method with the Berkeley DB backends since forever. If that doc is wrong, this warrants a warning in both the admin guide and the faq.
However, don't database recovery and restore of backup from DB + logs do the same thing? If so either both or neither should work. Unless the difference is that correct restore/recovery needs knowledge that slapd has but the DB tools by themselves do not.
(In the latter case, maybe restore by starting a slap tool instead of with db tools could work, I dunno.)
I'd say that anything referencing BerkeleyDB in there should be deleted, since it is specific to BerkeleyDB and not specific to OpenLDAP. (And most likely, in future releases, BerkeleyDB will disappear anyway.)
In terms of the Admin Guide, that in itself would just mean the section should be moved.
I wrote:
On Thu, 09 Feb 2012 14:36:20 -0800, Howard Chu hyc@symas.com wrote:
Chapter 19 is obviously a work-in-progress, transferred over from the FAQ-o-Matic.
Presumably because backup has previously only been described in the FAQ-o-Matic. But I'm pretty sure this has been the documented backup method with the Berkeley DB backends since forever.
"the documented" -> "a documented" backup method. In the FAQ, which people were commonly directed to read earlier in OpenLDAP's life.
Getting back to how to speed up restore:
If you do move to slapcat/slapadd, note that tuning slapd as described in the Guide speeds up slapadd a lot, if you have not already done that. So does the -q flag to slapadd. There was a time when a normally configured slapd would not open a database built with slapadd -q, but I think that was a long time ago.
Hallvard
On Friday, 10 February 2012 01:04:09 Hallvard B Furuseth wrote:
Getting back to how to speed up restore:
If you do move to slapcat/slapadd, note that tuning slapd as described in the Guide speeds up slapadd a lot, if you have not already done that. So does the -q flag to slapadd.
But, I still have problems with restore speed on my databases on current OpenLDAP on current platforms.
Specifically, OpenLDAP 2.3 and 2.4 can both do imports of a ~ 300 000 entry database in under an hour, but only on RHEL4. On RHEL5 (everything else identical), both 2.3 and 2.4 are about 6 times slower. I'm not sure if we have results for RHEL6 yet ...
Under an hour for a restore is fine. Over 4 is unacceptable.
Regards, Buchan
Buchan Milne wrote:
On Friday, 10 February 2012 01:04:09 Hallvard B Furuseth wrote:
Getting back to how to speed up restore:
If you do move to slapcat/slapadd, note that tuning slapd as described in the Guide speeds up slapadd a lot, if you have not already done that. So does the -q flag to slapadd.
But, I still have problems with restore speed on my databases on current OpenLDAP on current platforms.
Specifically, OpenLDAP 2.3 and 2.4 can both do imports of a ~ 300 000 entry database in under an hour, but only on RHEL4. On RHEL5 (everything else identical), both 2.3 and 2.4 are about 6 times slower. I'm not sure if we have results for RHEL6 yet ...
Pretty sure you'll find this is related to atime mount options on the filesystems and default behavior changing between rhel4 and rhel5. You can rule that out by using a shared memory region in all cases; you should find that the speeds are equivalent then (and much faster).
Under an hour for a restore is fine. Over 4 is unacceptable.
Agreed.
On 7/2/2012 11:53 μμ, Brian Reichert wrote:
I'm curious if the tactics described in this thread are currently sufficient:
http://www.openldap.org/lists/openldap-software/200608/msg00152.html
We are using CentOS 5.7 too. Upgrade from 2.3.43 as has been suggested already; it caused us several headaches (esp. database corruptions, which were solved in 2.4.x).
We have tried with success Buchan's, Symas Silver (if you don't need syncrepl provider or use Gold if you would like syncrepl provider functionality and some support) or LTB Project's RPMs. We are currently using the latter in one provider and three consumers (all CentOS 5.7) without problems. See: http://tools.ltb-project.org/projects/ltb for more.
No downtime needed for backups. We are doing hot slapcat and keep backups as LDIF. (Quote: "For BDB and HDB, slapcat(8) can generally be used while the server is running. ") When needed (practically never), we can slapadd from the LDIF (we have done it successfully).
We can even use LTB Project's init script to backup both the config db and the actual db:
# service slapd help slapd: [INFO] Using /etc/default/slapd for configuration Usage: /etc/init.d/slapd {start|stop|forcestop|restart|debug|force-reload|status|configtest|db_recover|reindex|removelogs|backup|restore|backupconfig|restoreconfig}
We have successfully used it to backup both databases.
Hope that helps. Nick
openldap-technical@openldap.org