sharing transactions across threads

List overview All Threads
Download

newer

older

Failing "id uid" test

Other error (80) when adding...

James Anderson

2 Dec 2020 2 Dec '20

12:17 p.m.

the mdb_env_open documentation includes in its note about NOTLS, that

A read-only transaction may span threads if the user synchronizes its use.

to which read-only operations would this constraint apply? mdb_cursor_open and mdb_cursor_close look as if they modify transaction state for write transactions only.

best regards, from berlin

Show replies by date

Howard Chu

2 Dec 2 Dec

1:53 p.m.

James Anderson wrote:

...

the mdb_env_open documentation includes in its note about NOTLS, that
A read-only transaction may span threads if the user synchronizes its use.
to which read-only operations would this constraint apply?

It depends.

The only safe approach is to ensure that a txn is not active simultaneously in multiple threads.

...

mdb_cursor_open and mdb_cursor_close look as if they modify transaction state for write transactions only.

best regards, from berlin

-- -- Howard Chu CTO, Symas Corp. http://www.symas.com Director, Highland Sun http://highlandsun.com/hyc/ Chief Architect, OpenLDAP http://www.openldap.org/project/

james anderson

2:36 p.m.

...

On 2020-12-02, at 22:53:58, Howard Chu hyc@symas.com wrote:

James Anderson wrote:

...
the mdb_env_open documentation includes in its note about NOTLS, that

A read-only transaction may span threads if the user synchronizes its use.

to which read-only operations would this constraint apply?

It depends.

The only safe approach is to ensure that a txn is not active simultaneously in multiple threads.

where “active” includes read-only cursors?

does mean, either one constrains the threads such that there can be no parallel access to the database, or each thread must establish its own transaction, in which case there is no guarantee that they operate om the same database state?

...

...
mdb_cursor_open and mdb_cursor_close look as if they modify transaction state for write transactions only.

best regards, from berlin

-- -- Howard Chu CTO, Symas Corp. http://www.symas.com Director, Highland Sun http://highlandsun.com/hyc/ Chief Architect, OpenLDAP http://www.openldap.org/project/

Gábor Melis

3 Dec 3 Dec

5:31 a.m.

On Wed, 2 Dec 2020 at 22:50, james anderson anderson.james.1955@gmail.com wrote:

...

...
On 2020-12-02, at 22:53:58, Howard Chu hyc@symas.com wrote:

James Anderson wrote:

...
the mdb_env_open documentation includes in its note about NOTLS, that

A read-only transaction may span threads if the user synchronizes its use.

to which read-only operations would this constraint apply?

It depends.

The only safe approach is to ensure that a txn is not active simultaneously in multiple threads.

where “active” includes read-only cursors?

does mean, either one constrains the threads such that there can be no parallel access to the database, or each thread must establish its own transaction, in which case there is no guarantee that they operate om the same database state?

Chiming in here, a cleaner api could be to allow starting a transaction with a given txn id. That way one would have separate transaction objects, but consistent state. The client code would need to synchronize threads a bit to guarantee that the txn id is still valid, but this would be more lightweight and easier to reason about.

Howard Chu

10:17 a.m.

Gábor Melis wrote:

...

On Wed, 2 Dec 2020 at 22:50, james anderson anderson.james.1955@gmail.com wrote:

...
...
On 2020-12-02, at 22:53:58, Howard Chu hyc@symas.com wrote:

James Anderson wrote:

...
the mdb_env_open documentation includes in its note about NOTLS, that

A read-only transaction may span threads if the user synchronizes its use.

to which read-only operations would this constraint apply?

It depends.

The only safe approach is to ensure that a txn is not active simultaneously in multiple threads.

where “active” includes read-only cursors?

does mean, either one constrains the threads such that there can be no parallel access to the database, or each thread must establish its own transaction, in which case there is no guarantee that they operate om the same database state?

Chiming in here, a cleaner api could be to allow starting a transaction with a given txn id. That way one would have separate transaction objects, but consistent state. The client code would need to synchronize threads a bit to guarantee that the txn id is still valid, but this would be more lightweight and easier to reason about.

In an actively written database there is no legitimate use case for opening a new transaction on anything but the newest version of the data. Reading or depending on stale data would be an application bug.

-- -- Howard Chu CTO, Symas Corp. http://www.symas.com Director, Highland Sun http://highlandsun.com/hyc/ Chief Architect, OpenLDAP http://www.openldap.org/project/

james anderson

12:41 p.m.

...

On 2020-12-03, at 19:17:46, Howard Chu hyc@symas.com wrote:

Gábor Melis wrote:

...
On Wed, 2 Dec 2020 at 22:50, james anderson anderson.james.1955@gmail.com wrote:

...
...
On 2020-12-02, at 22:53:58, Howard Chu hyc@symas.com wrote:

James Anderson wrote:

...
the mdb_env_open documentation includes in its note about NOTLS, that

A read-only transaction may span threads if the user synchronizes its use.

to which read-only operations would this constraint apply?

It depends.

The only safe approach is to ensure that a txn is not active simultaneously in multiple threads.

where “active” includes read-only cursors?

does mean, either one constrains the threads such that there can be no parallel access to the database, or each thread must establish its own transaction, in which case there is no guarantee that they operate om the same database state?

Chiming in here, a cleaner api could be to allow starting a transaction with a given txn id. That way one would have separate transaction objects, but consistent state. The client code would need to synchronize threads a bit to guarantee that the txn id is still valid, but this would be more lightweight and easier to reason about.

In an actively written database there is no legitimate use case for opening a new transaction on anything but the newest version of the data. Reading or depending on stale data would be an application bug.

without considering the relation between that notion and the management of data in a bitemporal store, the question remains, how are two independent threads to ensure that they are reading the same “newest” version when some other, likewise independent, process may commit a write transaction in the time interval between the instants of the respective read transaction begins? --- james anderson | james@dydra.com | http://dydra.com

Howard Chu

2:40 p.m.

james anderson wrote:

...

...
On 2020-12-03, at 19:17:46, Howard Chu hyc@symas.com wrote:

Gábor Melis wrote:

...
On Wed, 2 Dec 2020 at 22:50, james anderson anderson.james.1955@gmail.com wrote:

...
...
On 2020-12-02, at 22:53:58, Howard Chu hyc@symas.com wrote:

James Anderson wrote:

...
the mdb_env_open documentation includes in its note about NOTLS, that

A read-only transaction may span threads if the user synchronizes its use.

to which read-only operations would this constraint apply?

It depends.

The only safe approach is to ensure that a txn is not active simultaneously in multiple threads.

where “active” includes read-only cursors?

does mean, either one constrains the threads such that there can be no parallel access to the database, or each thread must establish its own transaction, in which case there is no guarantee that they operate om the same database state?

Chiming in here, a cleaner api could be to allow starting a transaction with a given txn id. That way one would have separate transaction objects, but consistent state. The client code would need to synchronize threads a bit to guarantee that the txn id is still valid, but this would be more lightweight and easier to reason about.

In an actively written database there is no legitimate use case for opening a new transaction on anything but the newest version of the data. Reading or depending on stale data would be an application bug.

without considering the relation between that notion and the management of data in a bitemporal store, the question remains, how are two independent threads to ensure that they are reading the same “newest” version when some other, likewise independent, process may commit a write transaction in the time interval between the instants of the respective read transaction begins?

How would you do this in any other database system?

-- -- Howard Chu CTO, Symas Corp. http://www.symas.com Director, Highland Sun http://highlandsun.com/hyc/ Chief Architect, OpenLDAP http://www.openldap.org/project/

james anderson

3:31 p.m.

...

On 2020-12-03, at 23:40:18, Howard Chu hyc@symas.com wrote:

james anderson wrote:

...
...
On 2020-12-03, at 19:17:46, Howard Chu hyc@symas.com wrote:

Gábor Melis wrote:

...
On Wed, 2 Dec 2020 at 22:50, james anderson anderson.james.1955@gmail.com wrote:

...
...
On 2020-12-02, at 22:53:58, Howard Chu hyc@symas.com wrote:

James Anderson wrote: > the mdb_env_open documentation includes in its note about NOTLS, that > > A read-only transaction may span threads if the user synchronizes its use. > > to which read-only operations would this constraint apply?

It depends.

The only safe approach is to ensure that a txn is not active simultaneously in multiple threads.

where “active” includes read-only cursors?

does mean, either one constrains the threads such that there can be no parallel access to the database, or each thread must establish its own transaction, in which case there is no guarantee that they operate om the same database state?

Chiming in here, a cleaner api could be to allow starting a transaction with a given txn id. That way one would have separate transaction objects, but consistent state. The client code would need to synchronize threads a bit to guarantee that the txn id is still valid, but this would be more lightweight and easier to reason about.

In an actively written database there is no legitimate use case for opening a new transaction on anything but the newest version of the data. Reading or depending on stale data would be an application bug.

without considering the relation between that notion and the management of data in a bitemporal store, the question remains, how are two independent threads to ensure that they are reading the same “newest” version when some other, likewise independent, process may commit a write transaction in the time interval between the instants of the respective read transaction begins?

How would you do this in any other database system?

i would expect it to permit one of the alternatives which has been mentioned: - allow multiple threads to perform read operations in the context of a single transaction - allow each thread to create a sub-transaction from an initial parent transaction and then operate on its child transaction - allow a thread to specify the revision identifier of an open transaction as the state for which it opens its own transaction.

i would expect to be able to do either of the first two options in an in-memory database which supports mvcc. i would expect the third option to be available in any database which supports access to revisions/versions.

i would expect the first option to apply to blueprints. i would expect the third option to apply to oracle or db2 “versioning". --- james anderson | james@dydra.com | http://dydra.com

Howard Chu

6:07 p.m.

james anderson wrote:

...

...
On 2020-12-03, at 23:40:18, Howard Chu hyc@symas.com wrote:

james anderson wrote:

...
...
On 2020-12-03, at 19:17:46, Howard Chu hyc@symas.com wrote:

Gábor Melis wrote:

...
On Wed, 2 Dec 2020 at 22:50, james anderson anderson.james.1955@gmail.com wrote:

...
> On 2020-12-02, at 22:53:58, Howard Chu hyc@symas.com wrote: > > James Anderson wrote: >> the mdb_env_open documentation includes in its note about NOTLS, that >> >> A read-only transaction may span threads if the user synchronizes its use. >> >> to which read-only operations would this constraint apply? > > It depends. > > The only safe approach is to ensure that a txn is not active simultaneously > in multiple threads.

where “active” includes read-only cursors?

does mean, either one constrains the threads such that there can be no parallel access to the database, or each thread must establish its own transaction, in which case there is no guarantee that they operate om the same database state?

Chiming in here, a cleaner api could be to allow starting a transaction with a given txn id. That way one would have separate transaction objects, but consistent state. The client code would need to synchronize threads a bit to guarantee that the txn id is still valid, but this would be more lightweight and easier to reason about.

In an actively written database there is no legitimate use case for opening a new transaction on anything but the newest version of the data. Reading or depending on stale data would be an application bug.

without considering the relation between that notion and the management of data in a bitemporal store, the question remains, how are two independent threads to ensure that they are reading the same “newest” version when some other, likewise independent, process may commit a write transaction in the time interval between the instants of the respective read transaction begins?

How would you do this in any other database system?

i would expect it to permit one of the alternatives which has been mentioned:

allow multiple threads to perform read operations in the context of a single transaction

No. A transaction is a single unit for concurrency control. Allowing multiple threads to operate within a single transaction means you have no control, and thus invites memory corruption. No transaction system in existence supports this.

...

allow each thread to create a sub-transaction from an initial parent transaction and then operate on its child transaction

Same as above. The docs on child transactions are quite clear - when a child transaction is active, the parent transaction cannot be used again until the child finishes.

...

allow a thread to specify the revision identifier of an open transaction as the state for which it opens its own transaction.

...

i would expect to be able to do either of the first two options in an in-mery database which supports mvcc.

Clearly that would be a broken design.

...

i would expect the third option to be available in any database which supports access to revisions/versions.

That is not what LMDB does. Please read the LMDB design spec. https://openldap.org/pub/

...

i would expect the first option to apply to blueprints. i would expect the third option to apply to oracle or db2 “versioning".

james anderson | james@dydra.com | http://dydra.com

-- -- Howard Chu CTO, Symas Corp. http://www.symas.com Director, Highland Sun http://highlandsun.com/hyc/ Chief Architect, OpenLDAP http://www.openldap.org/project/

james anderson

4 Dec 4 Dec

12:26 a.m.

...

On 2020-12-04, at 03:07:19, Howard Chu hyc@symas.com wrote:

james anderson wrote:

...
...
On 2020-12-03, at 23:40:18, Howard Chu hyc@symas.com wrote:

james anderson wrote:

...
...
On 2020-12-03, at 19:17:46, Howard Chu hyc@symas.com wrote:

Gábor Melis wrote:

...
On Wed, 2 Dec 2020 at 22:50, james anderson anderson.james.1955@gmail.com wrote: > > >> On 2020-12-02, at 22:53:58, Howard Chu hyc@symas.com wrote: >> >> James Anderson wrote: >>> the mdb_env_open documentation includes in its note about NOTLS, that >>> >>> A read-only transaction may span threads if the user synchronizes its use. >>> >>> to which read-only operations would this constraint apply? >> >> It depends. >> >> The only safe approach is to ensure that a txn is not active simultaneously >> in multiple threads. > > where “active” includes read-only cursors? > > does mean, either one constrains the threads such that there can be no parallel access to the database, or each thread must establish its own transaction, in which case there is no guarantee that they operate om the same database state?

Chiming in here, a cleaner api could be to allow starting a transaction with a given txn id. That way one would have separate transaction objects, but consistent state. The client code would need to synchronize threads a bit to guarantee that the txn id is still valid, but this would be more lightweight and easier to reason about.

In an actively written database there is no legitimate use case for opening a new transaction on anything but the newest version of the data. Reading or depending on stale data would be an application bug.

without considering the relation between that notion and the management of data in a bitemporal store, the question remains, how are two independent threads to ensure that they are reading the same “newest” version when some other, likewise independent, process may commit a write transaction in the time interval between the instants of the respective read transaction begins?

How would you do this in any other database system?

i would expect it to permit one of the alternatives which has been mentioned:

allow multiple threads to perform read operations in the context of a single transaction

No. A transaction is a single unit for concurrency control. Allowing multiple threads to operate within a single transaction means you have no control, and thus invites memory corruption.

you note in your paper on lmdb that, “because of [mvcc] isolation read accesses ... always have a self-consistent view of the database." how are operations performed by threads within a transaction to corrupt this view when their access is through read-only memory and they do nothing to change it?

...

No transaction system in existence supports this.

...

allow each thread to create a sub-transaction from an initial parent transaction and then operate on its child transaction

Same as above. The docs on child transactions are quite clear - when a child transaction is active, the parent transaction cannot be used again until the child finishes.

...

allow a thread to specify the revision identifier of an open transaction as the state for which it opens its own transaction.

...
i would expect to be able to do either of the first two options in an in-mery database which supports mvcc.

Clearly that would be a broken design.

...
i would expect the third option to be available in any database which supports access to revisions/versions.

That is not what LMDB does. Please read the LMDB design spec. https://openldap.org/pub/

i have re-read https://openldap.org/pub/hyc/mdb-paper.pdf with particular attention to whether your intent or the described implementation decisions necessarily prescribe an access pattern in which two autonomous threads are guaranteed read access to the same database state. as it were, if two threads should happen to begin their respective transaction for identical revisions, i would expect this to be the case. what have i overlooked in that exposition which precludes a mechanism which supports this?

...

...
i would expect the first option to apply to blueprints. i would expect the third option to apply to oracle or db2 “versioning”.

--- james anderson | james@dydra.com | http://dydra.com

Howard Chu

5:51 a.m.

james anderson wrote:

...

...
On 2020-12-04, at 03:07:19, Howard Chu hyc@symas.com wrote:

james anderson wrote:

...
...
On 2020-12-03, at 23:40:18, Howard Chu hyc@symas.com wrote:

james anderson wrote:

...
...
On 2020-12-03, at 19:17:46, Howard Chu hyc@symas.com wrote:

Gábor Melis wrote: > On Wed, 2 Dec 2020 at 22:50, james anderson > anderson.james.1955@gmail.com wrote: >> >> >>> On 2020-12-02, at 22:53:58, Howard Chu hyc@symas.com wrote: >>> >>> James Anderson wrote: >>>> the mdb_env_open documentation includes in its note about NOTLS, that >>>> >>>> A read-only transaction may span threads if the user synchronizes its use. >>>> >>>> to which read-only operations would this constraint apply? >>> >>> It depends. >>> >>> The only safe approach is to ensure that a txn is not active simultaneously >>> in multiple threads. >> >> where “active” includes read-only cursors? >> >> does mean, either one constrains the threads such that there can be no parallel access to the database, or each thread must establish its own transaction, in which case there is no guarantee that they operate om the same database state? > > Chiming in here, a cleaner api could be to allow starting a > transaction with a given txn id. That way one would have separate > transaction objects, but consistent state. The client code would need > to synchronize threads a bit to guarantee that the txn id is still > valid, but this would be more lightweight and easier to reason about.

In an actively written database there is no legitimate use case for opening a new transaction on anything but the newest version of the data. Reading or depending on stale data would be an application bug.

without considering the relation between that notion and the management of data in a bitemporal store, the question remains, how are two independent threads to ensure that they are reading the same “newest” version when some other, likewise independent, process may commit a write transaction in the time interval between the instants of the respective read transaction begins?

How would you do this in any other database system?

i would expect it to permit one of the alternatives which has been mentioned:

allow multiple threads to perform read operations in the context of a single transaction

No. A transaction is a single unit for concurrency control. Allowing multiple threads to operate within a single transaction means you have no control, and thus invites memory corruption.

you note in your paper on lmdb that, “because of [mvcc] isolation read accesses ... always have a self-consistent view of the database." how are operations performed by threads within a transaction to corrupt this view when their access is through read-only memory and they do nothing to change it?

You assume that the read-only on-disk state is the only state maintained within a transaction structure, which is not necessarily true. Anyway, it's not your place to make any assumptions about the internal state of the library beyond what the API docs guarantee.

Relying on internal implementation details like that is how you write broken software.

-- -- Howard Chu CTO, Symas Corp. http://www.symas.com Director, Highland Sun http://highlandsun.com/hyc/ Chief Architect, OpenLDAP http://www.openldap.org/project/

james anderson

8:30 a.m.

...

On 2020-12-04, at 14:51:42, Howard Chu hyc@symas.com wrote:

james anderson wrote:

...
...
On 2020-12-04, at 03:07:19, Howard Chu hyc@symas.com wrote:

james anderson wrote:

...
...
On 2020-12-03, at 23:40:18, Howard Chu hyc@symas.com wrote:

james anderson wrote:

...
> On 2020-12-03, at 19:17:46, Howard Chu hyc@symas.com wrote: > > Gábor Melis wrote: >> On Wed, 2 Dec 2020 at 22:50, james anderson >> anderson.james.1955@gmail.com wrote: >>> >>> >>>> On 2020-12-02, at 22:53:58, Howard Chu hyc@symas.com wrote: >>>> >>>> James Anderson wrote: >>>>> the mdb_env_open documentation includes in its note about NOTLS, that >>>>> >>>>> A read-only transaction may span threads if the user synchronizes its use. >>>>> >>>>> to which read-only operations would this constraint apply? >>>> >>>> It depends. >>>> >>>> The only safe approach is to ensure that a txn is not active simultaneously >>>> in multiple threads. >>> >>> where “active” includes read-only cursors? >>> >>> does mean, either one constrains the threads such that there can be no parallel access to the database, or each thread must establish its own transaction, in which case there is no >> >> Chiming in here, a cleaner api could be to allow starting a >> transaction with a given txn id. That way one would have separate >> transaction objects, but consistent state. The client code would need >> to synchronize threads a bit to guarantee that the txn id is still >> valid, but this would be more lightweight and easier to reason about. > > In an actively written database there is no legitimate use case for > opening a new transaction on anything but the newest version of the > data. Reading or depending on stale data would be an application bug.

without considering the relation between that notion and the management of data in a bitemporal store, the question remains, how are two independent threads to ensure that they are reading the same “newest” version when some other, likewise independent, process may commit a write transaction in the time interval between the instants of the respective read transaction begins?

How would you do this in any other database system?

i would expect it to permit one of the alternatives which has been mentioned:

allow multiple threads to perform read operations in the context of a single transaction

No. A transaction is a single unit for concurrency control. Allowing multiple threads to operate within a single transaction means you have no control, and thus invites memory corruption.

you note in your paper on lmdb that, “because of [mvcc] isolation read accesses ... always have a self-consistent view of the database." how are operations performed by threads within a transaction to corrupt this view when their access is through read-only memory and they do nothing to change it?

You assume that the read-only on-disk state is the only state maintained within a transaction structure, which is not necessarily true.

no, i did not. i asked.

...

Anyway, it's not your place to make any assumptions about the internal state of the library beyond what the API docs guarantee.

Relying on internal implementation details like that is how you write broken software.

all true, but, your question was

"How would you do this in any other database system?"

in any case, the central question in this thread was

“[how to] guarantee that [a number of autonomous threads] operate on the same database state?”

where “operate” was limited to read operations. your response suggests that one should not expect to be able to do this with the current lmdb api. have i understood you correctly? --- james anderson | james@dydra.com | http://dydra.com

1669

Age (days ago)

1671

Last active (days ago)

openldap-technical@openldap.org

11 comments

5 participants

tags (0)

participants (5)

Gábor Melis
Howard Chu
james anderson
James Anderson
james anderson