https://bugs.openldap.org/show_bug.cgi?id=10395
Issue ID: 10395 Summary: Support multiple readers on uncommitted changes Product: LMDB Version: unspecified Hardware: All OS: All Status: UNCONFIRMED Keywords: needs_review Severity: normal Priority: --- Component: liblmdb Assignee: bugs@openldap.org Reporter: renault.cle@gmail.com Target Milestone: ---
Created attachment 1086 --> https://bugs.openldap.org/attachment.cgi?id=1086&action=edit A patch to support multiple readers on uncommitted changes
Hello,
The attached patch is not meant to be merged immediately into LMDB. Still, it demonstrates how I added a helpful feature to the key-value store: reading uncommitted changes from multiple transactions. I am conscious that the patch still requires some work and uses non-C99 features, i.e., atomics are C11, which could be a blocker for it to be merged upstream. I would also be delighted to merge these changes under a flag/define to ensure we don't impact other users with non-C99 stuff.
The main feature we need at Meilisearch is to read uncommitted changes from multiple threads, compute parallel post-processing of different data structures [1], and speed up the search requests. We could have done the post-processing in a following transaction by opening multiple read transactions, but this would mean that the post-processed data structure would not include newly inserted or modified document IDs. Both data structures would be desync.
Regarding the design choice, I decided to follow the same design as the nested write transactions: use the parent argument of the mdb_txn_begin [2], and allow the MDB_RDONLY flag, which was disallowed when the parent argument was non-NULL [3]. I find it clear enough that, by calling the mdb_txn_begin function with these arguments, you can call it multiple times (I need to update the doc) to obtain nested read-only transactions from the parent write transaction.
ret = mdb_txn_begin(env, parent_txn, MDB_RDONLY, new_nested_rtxn);
Note that this early proposal lacks security and error handling. The generated transactions are fake-read-only and actually write transactions that share the underlying parent allocations and data structures. This is unsafe and must be changed or reviewed carefully, but most importantly, we need to add read-only capabilities to these transactions to disallow writes. Using a Rust wrapper on top of LMDB, I wrapped the fake read-only transactions into ReadTxn, which disallows any writes at compile time. However, I haven't checked the conflict database creations or openings.
The main issues I encountered were concurrent free of the main shared data structures when the different threads owning the transactions were dropping the transactions simultaneously. So, I decided to implement the equivalent of an ARC to free resources only when the last nested transaction was freed.
I can share numbers about how this feature improves the post-processing by 4x-9x or from 1200s to 120s [1]. You can look at this PR, which I would be happy to merge once an improved version of this patch lands on LMDB upstream.
I would be very happy if you could guide me a bit on how I could improve this patch to make it mergeable into LMDB. We want to contribute useful features like this to LMDB and not keep a deviant fork. LMDB works great; we are happy about it, and its performance is predictable.
Have a lovely week, kero
[1]: https://github.com/meilisearch/meilisearch/pull/5307 [2]: https://github.com/LMDB/lmdb/blob/14d6629bc8a9fe40d8a6bee1bf71c45afe7576b6/l... [3]: https://github.com/LMDB/lmdb/blob/14d6629bc8a9fe40d8a6bee1bf71c45afe7576b6/l...