The code setting up the locks is `mdb_env_setup_locks` circa line 4250 of mdb.c. This code has a bunch of #ifdefs for posix vs win32. A major difference is that locks for posix are non-recursive by default (unless you set a specific flag), whereas the CreateMutex operation used for win32 creates recursive locks.
The MDB_NOTLS flag only affects readers.
When two writers operate in the same thread using M:N threading, a bug will occur in Windows whenever two different lightweight 'writer' threads grab the same mutex in the same OS thread. Because both writers will succeed.
Conversely, in posix systems, a different bug will occur: the second writer will attempt to grab the mutex; this will lock up the thread... which, unfortunately, happens to be the same thread that the first writer needs to use to unlock the mutex. There is no way for a lightweight writer to migrate to another thread and unlock it there.
I believe both systems would behave more consistently and usefully if we switch from mutexes to semaphores (of size 1). The difference is that a semaphore doesn't track an 'owner' thread. This is also the suggestion to avoid the recursive mutex behavior of windows [1].
I would like to hear other opinions on this matter.
Regards,
Dave
[1] http://stackoverflow.com/questions/1988324/how-to-alter-the-recursive-lockin...
David Barbour wrote:
The code setting up the locks is `mdb_env_setup_locks` circa line 4250 of mdb.c. This code has a bunch of #ifdefs for posix vs win32. A major difference is that locks for posix are non-recursive by default (unless you set a specific flag), whereas the CreateMutex operation used for win32 creates recursive locks.
The MDB_NOTLS flag only affects readers.
When two writers operate in the same thread using M:N threading, a bug will occur in Windows whenever two different lightweight 'writer' threads grab the same mutex in the same OS thread. Because both writers will succeed.
Conversely, in posix systems, a different bug will occur: the second writer will attempt to grab the mutex; this will lock up the thread... which, unfortunately, happens to be the same thread that the first writer needs to use to unlock the mutex. There is no way for a lightweight writer to migrate to another thread and unlock it there.
I believe both systems would behave more consistently and usefully if we switch from mutexes to semaphores (of size 1). The difference is that a semaphore doesn't track an 'owner' thread. This is also the suggestion to avoid the recursive mutex behavior of windows [1].
I would like to hear other opinions on this matter.
Interesting. The downside of your semaphore suggestion on Windows is that means any thread can also unlock it, regardless of current owner. This is also detected as an error in POSIX mutexes.
LMDB is documented to be a single-writer design. I don't see any sane way for us to support M:N threading models ourselves; not portably to all the possible runtimes out there. I suggest you wrap your own mutex mechanism around your wrapper for mdb_txn_begin().
Regards,
Dave
[1] http://stackoverflow.com/questions/1988324/how-to-alter-the-recursive-lockin...
On Wed, Dec 10, 2014 at 7:21 PM, Howard Chu hyc@symas.com wrote:
Interesting. The downside of your semaphore suggestion on Windows is that means any thread can also unlock it, regardless of current owner. This is also detected as an error in POSIX mutexes.
Yeah, if we used semaphores, we might also wish to track transaction-object IDs (as a replacement for thread IDs) so we can perform our own safety checking. Gaining more control over the mutex model could be useful in other ways, e.g. supporting high priority writes.
OTOH, I want to move on to using LMDB, rather than working on it. My Haskell bindings to LMDB [1] are now in a usable condition, albeit only at the lowest level. You can add it to the list. :)
[1] http://hackage.haskell.org/package/lmdb-0.2.0
LMDB is documented to be a single-writer design. I don't see any sane way for us to support M:N threading models ourselves; not portably to all the possible runtimes out there. I suggest you wrap your own mutex mechanism around your wrapper for mdb_txn_begin().
That was my conclusion, too. And it's what I'm doing at the moment.
Best,
Dave
David Barbour wrote:
OTOH, I want to move on to using LMDB, rather than working on it. My Haskell bindings to LMDB [1] are now in a usable condition, albeit only at the lowest level. You can add it to the list. :)
Great! Done. http://symas.com/mdb/#wrappers
On 12/11/2014 02:21 AM, Howard Chu wrote:
David Barbour wrote:
When two writers operate in the same thread using M:N threading, a bug will occur in Windows whenever two different lightweight 'writer' threads grab the same mutex in the same OS thread. Because both writers will succeed.
Yes - after the code which locks the write mutex, mdb.c should do: if (env->me_txn) { unlock the mutex; return MDB_DEADLOCK; } That also helps with MDB_NOLOCK.
(...) Interesting. The downside of your semaphore suggestion on Windows is that means any thread can also unlock it, regardless of current owner. This is also detected as an error in POSIX mutexes.
If mdb.c uses error-checking mutexes pthread_mutexattr_settype(&mattr, PTHREAD_MUTEX_ERRORCHECK) and checks for the error, yes:-)
OTOH, I can't spot a problem with write transactions being thread- independent, as long as the user serializes the operations within these transactions.
On the third hand, semaphores will block before it can get to the env->me_txn error check above.
We could save the thread ID in the mdb_env and check it before locking, but that could give false positives because thread IDs need not be atomic-sized.
LMDB is documented to be a single-writer design. I don't see any sane way for us to support M:N threading models ourselves; not portably to all the possible runtimes out there. I suggest you wrap your own mutex mechanism around your wrapper for mdb_txn_begin().