Have you got backups from just before these occurrences? Can you see what the last valid transaction log files were before this? Or perhaps you can get some db_stat's off any other slaves that are still running OK? The idea is to see whether the current valid CSNs on an equivalent slave are anywhere near the numbers being logged here, e.g. 1/188113 or 1/8730339.
Have you actually run out of disk space on the partitions holding the logs? It's rather suspicious that two machines would act up at the same time unless some admin specifically disturbed the log files on those two systems at around that time.
I don't have backups for slave bdb logs. The master slapcat output is considered sacred data; the slave bdb log files are considered derivable thereof and don't get backed up (we'd sooner just replace the entire slave if it acts up). The odds of the partitions filling is minimal; Solaris has that logged at kern.notice (which on our configuration is serious enough to mean a write to NVRAM), and logs that extend prior to September 24 don't show any such messages.
With that said, "some admin specifically disturbed the log files around that time." Logs show that I was the only person in a position to do so (unless somebody broke in and covered their tracks; we'll ignore that theoretical possibility). On September 24, I reconfigured the slaves to use a different IP address to the master instead of the existing connection. The times are too coincidental to be unrelated:
(slave4) reconfigured Sep 24 09:41 (first syslog complaint 09:43) (slave6) reconfigured Sep 24 09:39 (first syslog complaint 09:44)
So...is there something that's cued off the (reverse?) name service entries for the master? Does the master IP hash in to a CSN somehow? And if this is indeed the case/root cause...well, quite honestly, I think that assuming a name service database will remain constant throughout a slapd instance is a fallacy. Furthermore, if this is indeed the case, it should be absolutely trivial for me to reproduce this (I can perform a DR on slave4/6, and reconfigure their network again).
With that in mind, I'll likely test this reproduction early next week. I can still get db_stat from all slaves (working and not) at this point if that's interesting. Comments?