On 2/27/07, Lesley Walker lesley.walker@opus.co.nz wrote:
Running openldap 2.3.32 with bdb 4.2 (and using syncrepl if that's relevant).
I need to deal with the issue of how to safely delete old bdb log files on our many replicas.
In a previous thread, Aaron Richton wrote:
I'd recommend DB_LOG_AUTOREMOVE. Barring that, you can run db_archive manually. Check Sleepycat docs for details on either.
Well, I just ran db_archive and caused widespread chaos because most (all?) of the replicas stopped responding to queries. (I have yet to perform a post-mortem)
I know that there's a bug in bdb 4.2 that causes logs to be held open even though they're no longer required. Upgrading bdb is not on the cards right now so I need to work around that problem by stopping and starting openldap.
So the question I have just at the moment is, when I run db_archive, should openldap be running or not running?
I've seen nothing in any docs that suggest it should be stopped, and the bdb docs simply imply that applications are expected to be running (but I'm not a programmer so my I interpreted it wrongly). So I ran db_archive just after starting openldap. Was that the wrong thing to do, and is it an obvious cause for the meltdown?
What options to db_archive did you use? I'm safely using -d and -a | xargs without any issues. Are you sure you used the db_archive that corresponds to the version of bdb that's being used?
After you ran db_recover and the server stopped responding, did you restart slapd and it started responding again? If not, did you try to start with debugging info to figure out why? How many log files did it remove? Maybe openldap hadn't fully initialized, or had a lot of writes pending when you did the db_archive. (although I don't think that should matter)
I would also recommend not using distro packages for anything you rely on beyond what the OS needs to run.