https://bugs.openldap.org/show_bug.cgi?id=10395
Issue ID: 10395 Summary: Support multiple readers on uncommitted changes Product: LMDB Version: unspecified Hardware: All OS: All Status: UNCONFIRMED Keywords: needs_review Severity: normal Priority: --- Component: liblmdb Assignee: bugs@openldap.org Reporter: renault.cle@gmail.com Target Milestone: ---
Created attachment 1086 --> https://bugs.openldap.org/attachment.cgi?id=1086&action=edit A patch to support multiple readers on uncommitted changes
Hello,
The attached patch is not meant to be merged immediately into LMDB. Still, it demonstrates how I added a helpful feature to the key-value store: reading uncommitted changes from multiple transactions. I am conscious that the patch still requires some work and uses non-C99 features, i.e., atomics are C11, which could be a blocker for it to be merged upstream. I would also be delighted to merge these changes under a flag/define to ensure we don't impact other users with non-C99 stuff.
The main feature we need at Meilisearch is to read uncommitted changes from multiple threads, compute parallel post-processing of different data structures [1], and speed up the search requests. We could have done the post-processing in a following transaction by opening multiple read transactions, but this would mean that the post-processed data structure would not include newly inserted or modified document IDs. Both data structures would be desync.
Regarding the design choice, I decided to follow the same design as the nested write transactions: use the parent argument of the mdb_txn_begin [2], and allow the MDB_RDONLY flag, which was disallowed when the parent argument was non-NULL [3]. I find it clear enough that, by calling the mdb_txn_begin function with these arguments, you can call it multiple times (I need to update the doc) to obtain nested read-only transactions from the parent write transaction.
ret = mdb_txn_begin(env, parent_txn, MDB_RDONLY, new_nested_rtxn);
Note that this early proposal lacks security and error handling. The generated transactions are fake-read-only and actually write transactions that share the underlying parent allocations and data structures. This is unsafe and must be changed or reviewed carefully, but most importantly, we need to add read-only capabilities to these transactions to disallow writes. Using a Rust wrapper on top of LMDB, I wrapped the fake read-only transactions into ReadTxn, which disallows any writes at compile time. However, I haven't checked the conflict database creations or openings.
The main issues I encountered were concurrent free of the main shared data structures when the different threads owning the transactions were dropping the transactions simultaneously. So, I decided to implement the equivalent of an ARC to free resources only when the last nested transaction was freed.
I can share numbers about how this feature improves the post-processing by 4x-9x or from 1200s to 120s [1]. You can look at this PR, which I would be happy to merge once an improved version of this patch lands on LMDB upstream.
I would be very happy if you could guide me a bit on how I could improve this patch to make it mergeable into LMDB. We want to contribute useful features like this to LMDB and not keep a deviant fork. LMDB works great; we are happy about it, and its performance is predictable.
Have a lovely week, kero
[1]: https://github.com/meilisearch/meilisearch/pull/5307 [2]: https://github.com/LMDB/lmdb/blob/14d6629bc8a9fe40d8a6bee1bf71c45afe7576b6/l... [3]: https://github.com/LMDB/lmdb/blob/14d6629bc8a9fe40d8a6bee1bf71c45afe7576b6/l...
https://bugs.openldap.org/show_bug.cgi?id=10395
--- Comment #1 from Howard Chu hyc@openldap.org --- I don't see intentionally breaking ACID as being a good idea. Reading uncommitted changes breaks Atomicity and Isolation at least, and may break Consistency in various situations.
The only thing that ought to be able to see uncommitted changes is the active write transaction. If anything else does, then you don't have a transactional DB engine any more. In ACID operation "newly inserted or modified document IDs" *don't exist* until their transaction commits.
Using multiple reader txns is the correct approach. When the writer commits, you can simply txn_reset() all the readers to get them all on the latest data.
https://bugs.openldap.org/show_bug.cgi?id=10395
--- Comment #2 from kero renault.cle@gmail.com ---
I don't see intentionally breaking ACID as being a good idea. Reading uncommitted changes breaks Atomicity and Isolation at least, and may break Consistency in various situations.
I may have been unclear about the changes I made, or I may have made an error in the code. The nested transactions do the same as the write transaction: they _read their own writes_. It is already supported. The only difference is that you can _read your writes_ in parallel, now. Additionally, the parent write transaction is unusable and only an abort can be performed, and only after the nested transactions are aborted. If it's not the case, I need to implement that as we mentioned in this Mastodon thread [1].
The goal of this feature is not related to parallel writes, but rather to allow parallel reads on uncommitted changes *from within the write transaction*. Nothing can go outside. To write into the write transaction, you have to abort the many nested read-only transactions and use the original parent transaction. You must store data somewhere (RAM, disk) before being able to write to LMDB. I don't want to break ACID, here.
The only thing that ought to be able to see uncommitted changes is the active write transaction. If anything else does, then you don't have a transactional DB engine any more. In ACID operation "newly inserted or modified document IDs" *don't exist* until their transaction commits.
Currently, IIRC, nested transactions can see uncommitted changes as well, as can the outer transaction. This PR proposes to do the same, but from multiple read-only transactions that are hooked to the parent transaction. You can only have one write-active nested transaction, and in the case of this very feature, there is not even the possibility to write while the nested read-only transactions are active.
Using multiple reader txns is the correct approach. When the writer commits, you can simply txn_reset() all the readers to get them all on the latest data.
That's the issue we have with Meilisearch: we would like to be able to read uncommitted changes to avoid desync issues, and in parallel, to do it quickly and scale with the number of CPUs.
[1]: https://fosstodon.org/@kero/113465979956049984
https://bugs.openldap.org/show_bug.cgi?id=10395
--- Comment #3 from kero renault.cle@gmail.com --- Currently, IIRC, nested transactions can see uncommitted changes as well, as can the *nested* transactions [..]