LMDB dead process detection

16 Jul 2013


      There's been a long-running discussion about the need to have APIs in liblmdb 
for displaying the reader table and clearing out stale slots. Quite a few open 
questions on the topic:
1) What should the API look like for examining the table?
    My initial instinct is to provide an iterator function that returns info 
about the next slot each time it's called. Not sure that this is necessary or 
most convenient though.
Another possibility is just a one-shot function that walks the table itself 
and dumps the output as a formatted string to stdout, stderr, or a custom 
output callback.
2) What should APIs look like for clearing out a stale slot?
    Should it just be implicit inside the library, with no externally visible 
API? I.e., should the library periodically check on its own, with no outside 
intervention? Or should there be an API that lets a user explicitly request a 
particular slot to be freed? This latter sounds pretty dangerous, since 
freeing a slot that's actually still in use would allow a reader's view of the 
DB to be corrupted.
3) What approach should be used for automatic detection of stale slots?
    Currently we record the process ID and thread ID of a reader in the table. 
It's not clear to me that the thread ID has anything more than informational 
value. Since we register a per-thread destructor for slots, exiting threads 
should never be leaving stale slots in the first place. I'm also not sure that 
there are good APIs for an outside caller to determine the liveness of a given 
thread ID.
    The process ID is also prone to wraparound; it's still very common for 
Linux systems to use 15 bit process IDs. So just checking that a pid is still 
alive doesn't guarantee that it's the same process that was using an LMDB 
environment at any point in time. We have two main approaches to work around 
this latter issue:
A) set a byte range lock for every process attached to the environment. 
This is what slapd's alock.c already does, which is used with BDB- and LDBM- 
based backends. This is fairly portable code, and has the desirable property 
that file locks automatically go away when a process exits. But:
       a) On Windows, the OS can take several minutes to clean up the locks of 
an exited process. So just checking for presence of a lock could erroneously 
consider a process to be alive long after it had actually died.
       b) file lock syscalls are fairly slow to execute. If we are checking 
liveness frequently, there will be a noticeable performance hit. Their 
performance also degrades exponentially with the number of processes locking 
concurrently, and degrades further still if networked filesystems are involved.
       c) This approach won't tell us if a process is in Zombie state.
B) check process ID and process start time.
This appears to be a fairly reliable approach, and reasonably fast, but there 
is no POSIX standard API for obtaining this process information. Methods for 
obtaining the info are fairly well documented across a variety of platforms 
(AIX, HPUX, multiple BSDs, Linux, Solaris, etc.) but they are all different. 
It appears that we can implement this compactly for each of the systems, but 
it means carrying around a dozen or so different implementations.
Also, assuming we want to support shared LMDB access across NFS (as discussed 
in an earlier thread), it seems we're going to have to use a lock-based 
solution anyway, since process IDs won't be meaningful across host boundaries.
We can implement approach (A) fairly easily, with no major repercussions. For 
(B) we would need to add a field to the reader table records to store the 
process start time. (Thus a lockfile format change.)
(note: performance of fcntl locks vs checking process start time was measured 
with some simple code on my laptop running Linux. These functions are all 
highly OS-dependent, so the perf ratios may vary quite a lot from system to 
system.)
The relative performance may not even be an issue in general, since we would 
only need to trigger a scan if a writer actually finds that some reader txn is 
preventing it from using free pages from the freeDB. Most of the time this 
wouldn't be happening. But if there were a legitimate long running read txn 
(e.g., for mdb_env_copy) we may find ourselves checking fairly often.
-- 
   -- Howard Chu
   CTO, Symas Corp.           http://www.symas.com
   Director, Highland Sun     http://highlandsun.com/hyc/
   Chief Architect, OpenLDAP  http://www.openldap.org/project/

2025

2024

2023

2022

2021

2020

2019

2018

2017

2016

2015

2014

2013

2012

2011

2010

2009

2008

2007

2006

LMDB dead process detection