Howard Chu writes:
There's been a long-running discussion about the need to have APIs in liblmdb for displaying the reader table and clearing out stale slots. Quite a few open questions on the topic: (...) 3) What approach should be used for automatic detection of stale slots?
Currently we record the process ID and thread ID of a reader in
the table. It's not clear to me that the thread ID has anything more than informational value. Since we register a per-thread destructor for slots, exiting threads should never be leaving stale slots in the first place.
Unless the thread is killed with TerminateThread() on Windows. The doc has a bunch of dire warnings about that, but I suspect real life may differ from Microsoft's recommendations.
I'm also not sure that there are good APIs for an outside caller to determine the liveness of a given thread ID.
As far as I can tell: Windows has thread IDs and handles for this. Posix does not provide a way for outside callers to get at threads - either kill them or exampine them. Individual OSes may, but then they likely provide both. E.g. Linux clone() can create a thread, and tgkill() can kill it. These calls use another ID than the Posix thread ID. I hope we don't want to know...
The process ID is also prone to wraparound; it's still very common
for Linux systems to use 15 bit process IDs. (...)
A) set a byte range lock for every process attached to the
environment. (...) c) This approach won't tell us if a process is in Zombie state.
Misplaced (c). This is the approach which does work portably for Zombies, at least on Unix. And as we've discussed, on at least some OSes, approach (B) below can also check for zombies, but it may take more time.
B) check process ID and process start time.
This appears to be a fairly reliable approach, and reasonably fast, but there is no POSIX standard API for obtaining this process information. (...)
We can implement approach (A) fairly easily, with no major repercussions. For (B) we would need to add a field to the reader table records to store the process start time. (Thus a lockfile format change.)
We need to change the lockfile version anyway. Otherwise one process using the current MDB version and one which uses either of these approaches, could sabotage each other.