Re: [LMDB] Lockups with robust mutexes and crashing processes

23 Nov 2014


      Marcos-David Dione wrote:
...
     I already posted this to the IRC channel, but there was no

response, so I repost this here.
... already followed up in IRC.
...
     I'm trying out lmdb from master, including the robust mutex

code. We're experiencing lock ups after the process holding the lock
dies, as if the robust lock was not recovered. I tried to come up with
an lmdb example that shows it and I got it, just a few lines. It uses
fork() just to automate it; see that the environment is opened in both
children. Here's the code:
http://pastebin.com/Cbbri6az
The example is broken; it does not mimic the behavior of a crashed 
process. In particular it does a clean call to mdb_env_close() but 
doesn't call mdb_txn_abort() first. An actual crashing process would not 
make the call to mdb_env_close(), and a cleanly exiting process would 
close all outstanding transactions before calling env_close.
...
     If I run this, I see that one of the children waits for the

write lock and is not awakened when the other child dies without closing
the txn (but notice I close the env). This is on purpose, to simulate a
crashing process.The worst part is that I can't reproduce it using
directly libpthread and mmap. Here is the code I came up with:
http://pastebin.com/ybR5L4cP
     It's a little bit more verbose because I based it on a glibc

test case.
     Are we missing anything? It seems to us that the code follows

does not break any of LMDB's caveats (specially the one about creating
the envs before fork()'ing. Is it wrong to assume that the waiting
process should recover the lock from staleness?
env_close does an munmap of the memory containing the mutex. According 
to the manpages, a robust mutex is supposed to automatically unlock when 
unmapped. Since this is not happening, it appears you've found a kernel 
bug. Regardless, the example is invalid. If you modify the code to just 
exit/abort/die without the bogus call to env_close, the other process 
wakes up correctly. E.g.
http://pastebin.com/9jieDnUz
...
--
Marcos Dione
Astek Sud-Est
R&D-SSP-DTA-TAE-TDS
for Amadeus SAS
T: +33 (4)4 9704 1727
marcos-david.dione@amadeus.com
-- 
   -- Howard Chu
   CTO, Symas Corp.           http://www.symas.com
   Director, Highland Sun     http://highlandsun.com/hyc/
   Chief Architect, OpenLDAP  http://www.openldap.org/project/

2025

2024

2023

2022

2021

2020

2019

2018

2017

2016

2015

2014

2013

2012

2011

2010

2009

2008

2007

2006

Re: [LMDB] Lockups with robust mutexes and crashing processes