Hello
I'm writing a Common Lisp wrapper for LMDB, starting where the previous efforts left off. I have a number of questions related to safety and the color of the smoke after a disaster.
1. lmdb.h says that "A parent transaction and its cursors may not issue any other operations than mdb_txn_commit and mdb_txn_abort while it has active child transactions."
What I observe is that when a cursor associated with the parent transaction is used in the child, there are no errors and the cursor behaves (my test only involved mdb_cursor_put and MDB_SET_KEY) as if it belonged to the child.
Is this to be expected in general or my tests are insufficient and something really bad can happen? If this is a disaster waiting to happen, I need to add checks to the cursor code.
2. mdb_txns are calloc()ed and free()d. In the case where a thread performs some operation (e.g. put, get, del) involving an already freed mdb_txn pointer, what kind of nastiness can happen? Can the database be corrupted?
3. Same question about mdb_cursors.
4. Async unwind safety. This is a bit like a thread being destroyed in the middle of an lmdb function call.
Context: In some Common Lisp implementations (SBCL), Posix interrupts like SIGINT are used during development. If the developer presses C-c the lisp debugger will start where the signal handler was invoked, which may be in the middle of some mdb_* call. Depending on the actions taken, the stack (both the lisp and the C stack) may be unwound to some earlier frame. Another example is async timeouts (SBCL's WITH-TIMEOUT) can also unwind the stack. I understand that async unwinds are unsafe in general.
There is a way to defer handling of interrupts, which I already use to protect allocations (mdb_txn_begin, mdb_txn_commit and similar), but it has a small performance cost and I hesitate to apply it to performance hotspots (e.g. put, get, del and most cursor ops). Are [some of] these functions safe in face of async unwinds? What kind of problem may arise?
Cheers, Gábor Melis
Gábor Melis wrote:
Hello
I'm writing a Common Lisp wrapper for LMDB, starting where the previous efforts left off. I have a number of questions related to safety and the color of the smoke after a disaster.
You should consider any misuses as you describe here as fatal, resulting in irreparably corrupted DBs.
lmdb.h says that "A parent transaction and its cursors may not issue any other operations than mdb_txn_commit and mdb_txn_abort while it has active child transactions."
What I observe is that when a cursor associated with the parent transaction is used in the child, there are no errors and the cursor behaves (my test only involved mdb_cursor_put and MDB_SET_KEY) as if it belonged to the child.
Is this to be expected in general or my tests are insufficient and something really bad can happen? If this is a disaster waiting to happen, I need to add checks to the cursor code.
Sounds like your test case was lucky.
- mdb_txns are calloc()ed and free()d. In the case where a thread performs some operation (e.g. put, get, del) involving an already freed mdb_txn pointer, what kind of nastiness can happen? Can the database be corrupted?
The C standard says any references to freed memory result in undefined behavior. Nobody can give you a more specific answer than that.
Same question about mdb_cursors.
Async unwind safety. This is a bit like a thread being destroyed in the middle of an lmdb function call.
Context: In some Common Lisp implementations (SBCL), Posix interrupts like SIGINT are used during development. If the developer presses C-c the lisp debugger will start where the signal handler was invoked, which may be in the middle of some mdb_* call. Depending on the actions taken, the stack (both the lisp and the C stack) may be unwound to some earlier frame. Another example is async timeouts (SBCL's WITH-TIMEOUT) can also unwind the stack. I understand that async unwinds are unsafe in general.
There is a way to defer handling of interrupts, which I already use to protect allocations (mdb_txn_begin, mdb_txn_commit and similar), but it has a small performance cost and I hesitate to apply it to performance hotspots (e.g. put, get, del and most cursor ops). Are [some of] these functions safe in face of async unwinds? What kind of problem may arise?
In a default build, read txns are always safe. No guarantees on an interrupted write txn.
Cheers, Gábor Melis
Howard Chu hyc@symas.com schrieb am 14.08.2020 um 02:14 in Nachricht
a9c55a18-7a70-1c84-42d7-bf716345591d@symas.com:
Gábor Melis wrote:
Hello
I'm writing a Common Lisp wrapper for LMDB, starting where the previous efforts left off. I have a number of questions related to safety and the color of the smoke after a disaster.
You should consider any misuses as you describe here as fatal, resulting in irreparably corrupted DBs.
lmdb.h says that "A parent transaction and its cursors may not issue any other operations than mdb_txn_commit and mdb_txn_abort while it has active child transactions."
What I observe is that when a cursor associated with the parent transaction is used in the child, there are no errors and the cursor behaves (my test only involved mdb_cursor_put and MDB_SET_KEY) as if it belonged to the child.
Is this to be expected in general or my tests are insufficient and something really bad can happen? If this is a disaster waiting to happen, I need to add checks to the cursor code.
Sounds like your test case was lucky.
- mdb_txns are calloc()ed and free()d. In the case where a thread performs some operation (e.g. put, get, del) involving an already freed mdb_txn pointer, what kind of nastiness can happen? Can the database be corrupted?
The C standard says any references to freed memory result in undefined behavior. Nobody can give you a more specific answer than that.
Same question about mdb_cursors.
Async unwind safety. This is a bit like a thread being destroyed in the middle of an lmdb function call.
Context: In some Common Lisp implementations (SBCL), Posix interrupts like SIGINT are used during development. If the developer presses C-c the lisp debugger will start where the signal handler was invoked, which may be in the middle of some mdb_* call. Depending on the actions taken, the stack (both the lisp and the C stack) may be unwound to some earlier frame. Another example is async timeouts (SBCL's WITH-TIMEOUT) can also unwind the stack. I understand that async unwinds are unsafe in general.
There is a way to defer handling of interrupts, which I already use to protect allocations (mdb_txn_begin, mdb_txn_commit and similar), but it has a small performance cost and I hesitate to apply it to performance hotspots (e.g. put, get, del and most cursor ops). Are [some of] these functions safe in face of async unwinds? What kind of problem may arise?
In a default build, read txns are always safe. No guarantees on an interrupted write txn.
But still a valid issue is: Can some extra "debugging" protection code be added to locate such problems in LMDB? Examples I could think of: Assigning NULL to pointers that were freed, assigning -1 for file descriptors that were closed, etc. Then you'd get a core dump on modern architectures after such constellations at least.
Cheers, Gábor Melis
-- -- Howard Chu CTO, Symas Corp. http://www.symas.com Director, Highland Sun http://highlandsun.com/hyc/ Chief Architect, OpenLDAP http://www.openldap.org/project/
On Fri, 14 Aug 2020 at 02:14, Howard Chu hyc@symas.com wrote:
In a default build, read txns are always safe. No guarantees on an interrupted write txn.
Are reads (mdb_get) in a write txn also problematic or would it be async unwind safe?
For example, a quick look at mdb_get did not reveal any side-effecting operations except changing flags of the throw-away cursor. I could have easily missed some, though.
Thank you for the quick answers, Gábor
Gábor Melis wrote:
On Fri, 14 Aug 2020 at 02:14, Howard Chu hyc@symas.com wrote:
In a default build, read txns are always safe. No guarantees on an interrupted write txn.
Are reads (mdb_get) in a write txn also problematic or would it be async unwind safe?
For example, a quick look at mdb_get did not reveal any side-effecting operations except changing flags of the throw-away cursor. I could have easily missed some, though.
In a default build, reads in a write txn should be safe.
If a thread is interrupted in the middle of a write op and not resumed where it left off, then you'll probably need to abort the active txn and close the environment. Nothing on-disk will be damaged, but in-memory bookkeeping will probably be corrupted by such an interruption.
Thank you for the quick answers, Gábor
Gábor Melis mega@retes.hu schrieb am 14.08.2020 um 00:40 in Nachricht
CADJFn4Xp1XESjGC01_bpjy8huvKBCovdAuFoZmJXgMo_2ZnffQ@mail.gmail.com:
...
- mdb_txns are calloc()ed and free()d. In the case where a thread performs some operation (e.g. put, get, del) involving an already freed mdb_txn pointer, what kind of nastiness can happen? Can the database be corrupted?
...
I can't answer specifically, but in general double-freeing memory or using memory that had been freed already is a bad idea. Many years ago when I had written a C program for an MS-DOS PC, suddenly the screen was flickering showing random characters, then the PC froze. It was just a double-free. So _anything_ can happen if you continue to use memory alredy freed. It is possible that some newer libraries protect against double-freeing memry (at least as long as the memory wasn'r reallocated again), but still...
Regards, Ulrich
openldap-technical@openldap.org