richton@nbcs.rutgers.edu wrote:
If this is happening even with slapd cleanly shut down then it should also prevent slapd from restarting, since slapd first attempts to join an existing environment before trying to create a new one. And that really implies that the rest of the environment is shot.
Agreed, but that's a pretty awful condition to have in a long-running slapd process. Without db_stat (easily) working, is there any hope at finding clues as to how this might have happened, or is it just time to rm/slapadd and hope it doesn't happen again?
It doesn't seem like we can get much more info out of this. One more thing to try would be a full-debug build of libdb, so we can see exactly where it hangs when trying to join the environment. Looking thru the code, I only see one mutex to acquire the environment, and looking at your stack trace it's already past that location, but the trace could be lying.
Also the mutex used to lock the environment is a regular mutex, not a persistent lock. So when all processes have closed the environment, there shouldn't be anything left to conflict with here. So most likely the environment data structures are hosed, and the thread is locking against itself. Again, we can't really tell without single-stepping thru the BDB library code. It may not be worth the effort, but that's your call.