Aaron Richton wrote:
No. The BDB transaction log files don't know (or care) anything about IP addresses. Nothing at the slapd layer could have any direct effect on the BDB transaction logs. How exactly did you reconfigure the servers, did you stop them and restart them or did you use cn=config?
echo 192.blahblah master.r.e >> /etc/hosts
The master changed from 128.blahblah to 192.blahblah. Same physical machine, just different interface. On slave4 and 6, I didn't touch slapd.
(Of course, if you only appended to /etc/hosts then the old address is still in there and getting used first..)
Might as well get the db_stat -l output for a few of them to compare.
This isn't going well at all; they just can't join the environment. I tried on slave1, it hung. I tried on slave4 under truss, it hung. (We're talking >30 minutes here.) Although I swear I've run db_stat hot, I killed db_stat (ungracefully, sadly) and stopped slapd (gracefully) on slave1, ran db_stat again, and it hung there...and corrupted the environment to the point where I couldn't get db_recover/slapd to run. (I ended up blowing the slave1 database away; it's refreshing from syncrepl now.)
I've got a few more slaves that I haven't shot in the foot yet, and I only tried this on one of the suffixes on slave{1,4}. Plenty of more opportunities to screw this up yet if there's anything to try...I suppose I could go for -N, or if the command line is going to be a pain, I could join the slapd process with dbx and print ->log_stat myself (although I might need a bit of hand holding on that)...
[the hang on slave4] db_stat -> libdb-4.2.so:*db_env_create(0xffbffaec, 0x0, 0x17154) lwp_mutex_lock(0xFF0D0000) (sleeping...) mutex type: USYNC_PROCESS
ff307248 __db_des_get (29ac0, 29d78, 29d78, ffbff9d0, 0, ffbff9d9) + c0 ff305780 __db_e_attach (29ac0, ffbffa94, 40400, 40000, 33e021, 29d71) + 6e0 ff2ff434 __dbenv_open (29ac0, 0, 40400, 0, 0, 0) + 664 00016514 db_init (29ac0, 0, 4, 100000, ffbffba0, ff3deb54) + 64 00011e3c main (2, ffbffc44, ffbffc50, 29800, 0, 0) + 9a4 00011470 _start (0, 0, 0, 0, 0, 0) + 108
If this is happening even with slapd cleanly shut down then it should also prevent slapd from restarting, since slapd first attempts to join an existing environment before trying to create a new one. And that really implies that the rest of the environment is shot.