New subject: [Issue 10095] Race condition causing corruption of mutexes when closing the database

25 Aug 2023


      https://bugs.openldap.org/show_bug.cgi?id=10095
Issue ID: 10095
           Summary: Race condition causing corruption of mutexes when
                    closing the database
           Product: LMDB
           Version: 0.9.30
          Hardware: x86_64
                OS: Linux
            Status: UNCONFIRMED
          Keywords: needs_review
          Severity: normal
          Priority: ---
         Component: liblmdb
          Assignee: bugs@openldap.org
          Reporter: peter@peterzhu.ca
  Target Milestone: ---
We're running into a race condition across multiple processes causing the
corruption of mutexes when a process closes the database caused by the fix for
https://bugs.openldap.org/show_bug.cgi?id=9278 (commit
https://git.openldap.org/openldap/openldap/-/commit/f683ffdc81d0edb20437cb7d...).
Here's the interleaving of two processes (p0 and p1) that can cause this
situation.
p0: Opens connection to database using mdb_env_create and mdb_env_open.
...some things happen in between...
p0: Begins closing the database using mdb_env_close:
  p0: Calls mdb_env_close0:
    p0: Acquires write lock on the file lock using mdb_env_excl_lock.
    p0: Calls pthread_mutex_destroy on the mutexes.
SWITCH TO p1
p1: Begins opening the database using mdb_env_create. Then calls mdb_env_open,
in mdb_env_open: 
  p1: Calls mdb_env_setup_locks:
    p1: Calls mdb_env_excl_lock, but it's unable to acquire a write file lock
due to p0 holding the write file lock. It waits on acquiring a read file lock.
SWITCH TO p0
p0: Calls close on the file descriptor which releases the write lock.
SWITCH TO p1
p1: Acquires the read file lock.
    p1: Does NOT call pthread_mutex_init since it did not acquire a write file
lock.
...some things happen in between...
p1: Try to lock the mutex using pthread_mutex_lock. This call fails with a
EINVAL due to locking a destroyed mutex.
I'm not sure how to actually solve this problem. We're currently mitigating
this problem by reverting the commit linked above (so no mutexes get
destroyed).
-- 
You are receiving this mail because:
You are on the CC list for the issue.

[Issue 10095] New: Race condition causing corruption of mutexes when closing the database