Re: mdb_dbi_open and threads

List overview All Threads
Download

newer

older

Seach Object

Howard Chu

20 May 2017 20 May '17

2:51 p.m.

Muhammed Muneer wrote:

...

Thanks for clarifying. I did get the impression but there was some part of it where I thought the doc was not clear enough.

It is a fundamental principle of ACID transactions. The "I" in ACID stands for Isolation. That means nothing that changes elsewhere, while a transaction is underway, can affect that particular transaction. Since we already document that LMDB is a full ACID DB engine, none of this should require any further explanation.

...

Thanks anyway.

On Sat, May 20, 2017 at 10:46 PM, Howard Chu <hyc@symas.com mailto:hyc@symas.com> wrote:

Muhammed Muneer wrote:

    Is the following  valid.

    1) Read transaction 1 is started
    2) Read transaction 2 is started and calls mdb_dbi_open and commits
    3) Read transaction 1 uses the handle from (2)


Please learn how to read. The doc says more than you quoted before, and it
already quite explicitly defines its properties:

http://www.lmdb.tech/doc/group__internal.html#gac08cad5b096925642ca359a6d6f0562a
<http://www.lmdb.tech/doc/group__internal.html#gac08cad5b096925642ca359a6d6f0562a>

        The database handle will be private to the current transaction
until the transaction is successfully committed. If the transaction is
aborted the handle will be closed automatically. After a successful commit
the handle will reside in the shared environment, and may be used by other
transactions.

In your question above, transaction 1 is not after transaction 2 therefore
it cannot use the handle that transaction 2 commits.

The handle does not exist in the shared environment until after its
opening transaction commits. If it doesn't exist in the shared environment
when a transaction begins, then it is not visible or valid in that
particular transaction.

Just follow the recommendation to open all handles at the beginning of the
program.



    On Sat, May 20, 2017 at 6:46 PM, Muhammed Muneer <elendilm@gmail.com
    <mailto:elendilm@gmail.com>
    <mailto:elendilm@gmail.com <mailto:elendilm@gmail.com>>> wrote:

        Thanks.

        On Sat, May 20, 2017 at 6:29 PM, Klaus Malorny
    <Klaus.Malorny@knipp.de <mailto:Klaus.Malorny@knipp.de>
        <mailto:Klaus.Malorny@knipp.de <mailto:Klaus.Malorny@knipp.de>>>
    wrote:

            On 5/20/17 2:02 PM, Muhammed Muneer wrote:

                So when the doc says

                      * The database handle will be private to the current
                transaction until
                      * the transaction is successfully committed.

                "the handle being private" only refers to the first
    mdb_dbi_open.
                Once this transaction is committed, one doesn't have to
    call this
                again in subsequent concurrent transactions and can use
    this handle.
                And then this handle won't be private at all.

                Did I get it right?


            Yes. I do this exactly that way as part of my initial setup of my
            application. Once the transaction is committed, I use the returned
            handles setup from whatever thread that needs to access the
    respective
            database. I never call mdb_dbi_open again after the setup.

            Regards,

            Klaus

-- -- Howard Chu CTO, Symas Corp. http://www.symas.com Director, Highland Sun http://highlandsun.com/hyc/ Chief Architect, OpenLDAP http://www.openldap.org/project/

Show replies by date

Muhammed Muneer

20 May 20 May

4 p.m.

New subject: mdb_dbi_open and threads

My bad. Didn't think it that way. Thanks for the info.

On Sun, May 21, 2017 at 1:51 AM, Howard Chu hyc@symas.com wrote:

...

Muhammed Muneer wrote:

...
Thanks for clarifying. I did get the impression but there was some part of it where I thought the doc was not clear enough.

It is a fundamental principle of ACID transactions. The "I" in ACID stands for Isolation. That means nothing that changes elsewhere, while a transaction is underway, can affect that particular transaction. Since we already document that LMDB is a full ACID DB engine, none of this should require any further explanation.

Thanks anyway.

...
On Sat, May 20, 2017 at 10:46 PM, Howard Chu <hyc@symas.com mailto:hyc@symas.com> wrote:
Muhammed Muneer wrote:

    Is the following  valid.

    1) Read transaction 1 is started
    2) Read transaction 2 is started and calls mdb_dbi_open and
commits 3) Read transaction 1 uses the handle from (2)
Please learn how to read. The doc says more than you quoted before,
and it already quite explicitly defines its properties:
http://www.lmdb.tech/doc/group__internal.html#gac08cad5b0969
25642ca359a6d6f0562a http://www.lmdb.tech/doc/group__internal.html#gac08cad5b096 925642ca359a6d6f0562a
        The database handle will be private to the current transaction
until the transaction is successfully committed. If the transaction is
aborted the handle will be closed automatically. After a successful
commit the handle will reside in the shared environment, and may be used by other transactions.
In your question above, transaction 1 is not after transaction 2
therefore it cannot use the handle that transaction 2 commits.
The handle does not exist in the shared environment until after its
opening transaction commits. If it doesn't exist in the shared
environment when a transaction begins, then it is not visible or valid in that particular transaction.
Just follow the recommendation to open all handles at the beginning
of the program.
    On Sat, May 20, 2017 at 6:46 PM, Muhammed Muneer <
elendilm@gmail.com mailto:elendilm@gmail.com <mailto:elendilm@gmail.com mailto:elendilm@gmail.com>> wrote:
        Thanks.

        On Sat, May 20, 2017 at 6:29 PM, Klaus Malorny
    <Klaus.Malorny@knipp.de <mailto:Klaus.Malorny@knipp.de>
        <mailto:Klaus.Malorny@knipp.de <mailto:Klaus.Malorny@knipp.de
...
...
...
    wrote:

            On 5/20/17 2:02 PM, Muhammed Muneer wrote:

                So when the doc says

                      * The database handle will be private to the
current transaction until * the transaction is successfully committed.
                "the handle being private" only refers to the first
    mdb_dbi_open.
                Once this transaction is committed, one doesn't have
to call this again in subsequent concurrent transactions and can use this handle. And then this handle won't be private at all.
                Did I get it right?


            Yes. I do this exactly that way as part of my initial
setup of my application. Once the transaction is committed, I use the returned handles setup from whatever thread that needs to access the respective database. I never call mdb_dbi_open again after the setup.
            Regards,

            Klaus
-- -- Howard Chu CTO, Symas Corp. http://www.symas.com Director, Highland Sun http://highlandsun.com/hyc/ Chief Architect, OpenLDAP http://www.openldap.org/project/

Muhammed Muneer

21 May 21 May

12:43 p.m.

New subject: mdb_dbi_open and threads

Howard Chu wrote

"Just follow the recommendation to open all handles at the beginning of the program."

But what if I have lots of named databases like maybe 10000 or more. Wouldn't this be expensive.

I am developing a MongoDB like database (similar in query and update syntax) around LMDB. The thing is I have some enhancements on my own like the ability to generate update queries from within an ongoing update.

So in a multi threaded environment, if the name of a named dbi is generated from within a write transaction (thread1) and proceeds to mdb_dbi_open it only to find that another read transaction (thread 2) just opened the same named dbi after the write-txn of thread 1 started, the prospect of mdb_dbi_open the same named dbi for thread 1 is lost forever.

On Sun, May 21, 2017 at 3:00 AM, Muhammed Muneer elendilm@gmail.com wrote:

...

My bad. Didn't think it that way. Thanks for the info.

On Sun, May 21, 2017 at 1:51 AM, Howard Chu hyc@symas.com wrote:

...
Muhammed Muneer wrote:

...
Thanks for clarifying. I did get the impression but there was some part of it where I thought the doc was not clear enough.

It is a fundamental principle of ACID transactions. The "I" in ACID stands for Isolation. That means nothing that changes elsewhere, while a transaction is underway, can affect that particular transaction. Since we already document that LMDB is a full ACID DB engine, none of this should require any further explanation.

Thanks anyway.

...
On Sat, May 20, 2017 at 10:46 PM, Howard Chu <hyc@symas.com mailto:hyc@symas.com> wrote:
Muhammed Muneer wrote:

    Is the following  valid.

    1) Read transaction 1 is started
    2) Read transaction 2 is started and calls mdb_dbi_open and
commits 3) Read transaction 1 uses the handle from (2)
Please learn how to read. The doc says more than you quoted before,
and it already quite explicitly defines its properties:
http://www.lmdb.tech/doc/group__internal.html#gac08cad5b0969
25642ca359a6d6f0562a http://www.lmdb.tech/doc/group__internal.html#gac08cad5b096 925642ca359a6d6f0562a
        The database handle will be private to the current
transaction until the transaction is successfully committed. If the transaction is aborted the handle will be closed automatically. After a successful commit the handle will reside in the shared environment, and may be used by other transactions.
In your question above, transaction 1 is not after transaction 2
therefore it cannot use the handle that transaction 2 commits.
The handle does not exist in the shared environment until after its
opening transaction commits. If it doesn't exist in the shared
environment when a transaction begins, then it is not visible or valid in that particular transaction.
Just follow the recommendation to open all handles at the beginning
of the program.
    On Sat, May 20, 2017 at 6:46 PM, Muhammed Muneer <
elendilm@gmail.com mailto:elendilm@gmail.com <mailto:elendilm@gmail.com mailto:elendilm@gmail.com>> wrote:
        Thanks.

        On Sat, May 20, 2017 at 6:29 PM, Klaus Malorny
    <Klaus.Malorny@knipp.de <mailto:Klaus.Malorny@knipp.de>
        <mailto:Klaus.Malorny@knipp.de <mailto:
Klaus.Malorny@knipp.de>>>
    wrote:

            On 5/20/17 2:02 PM, Muhammed Muneer wrote:

                So when the doc says

                      * The database handle will be private to the
current transaction until * the transaction is successfully committed.
                "the handle being private" only refers to the first
    mdb_dbi_open.
                Once this transaction is committed, one doesn't have
to call this again in subsequent concurrent transactions and can use this handle. And then this handle won't be private at all.
                Did I get it right?


            Yes. I do this exactly that way as part of my initial
setup of my application. Once the transaction is committed, I use the returned handles setup from whatever thread that needs to access the respective database. I never call mdb_dbi_open again after the setup.
            Regards,

            Klaus
-- -- Howard Chu CTO, Symas Corp. http://www.symas.com Director, Highland Sun http://highlandsun.com/hyc/ Chief Architect, OpenLDAP http://www.openldap.org/project/

Hallvard Breien Furuseth

22 May 22 May

2:18 a.m.

New subject: mdb_dbi_open and threads

On 21. mai 2017 21:43, Muhammed Muneer wrote:

...

Howard Chu wrote

"Just follow the recommendation to open all handles at the beginning of the program."

But what if I have lots of named databases like maybe 10000 or more. Wouldn't this be expensive.

Yes. From ldmb.h:

"Currently a moderate number of slots are cheap but a huge number gets expensive: 7-120 words per transaction, and every #mdb_dbi_open() does a linear search of the opened slots."

...

So in a multi threaded environment, if the name of a named dbi is generated from within a write transaction (thread1) and proceeds to mdb_dbi_open it only to find that another read transaction (thread 2) just opened the same named dbi after the write-txn of thread 1 started, the prospect of mdb_dbi_open the same named dbi for thread 1 is lost forever.

With threads 1 and 2 coexisting? When thread 2 called mdb_dbi_open(), thread 1's prospect of using mdb_dbi_open() at all was lost.

-- Hallvard

Klaus Malorny

3:01 a.m.

New subject: mdb_dbi_open and threads

On 5/21/17 9:43 PM, Muhammed Muneer wrote:

...

Howard Chu wrote

"Just follow the recommendation to open all handles at the beginning of the program."

But what if I have lots of named databases like maybe 10000 or more. Wouldn't this be expensive.

I am developing a MongoDB like database (similar in query and update syntax) around LMDB. The thing is I have some enhancements on my own like the ability to generate update queries from within an ongoing update.

So in a multi threaded environment, if the name of a named dbi is generated from within a write transaction (thread1) and proceeds to mdb_dbi_open it only to find that another read transaction (thread 2) just opened the same named dbi after the write-txn of thread 1 started, the prospect of mdb_dbi_open the same named dbi for thread 1 is lost forever.

Please remember that you can have only one writing transaction at once. And first looking in a read transaction whether a database exists and then creating it in a second write transaction is definitely a bad and risky programming style, as it carries an assumption from one transaction to the next, which is typically not valid.

I have no experience with a large number of databases, but if it is a performance problem as Hallvard and the docs describe, then you still have the option to combine all your logical databases into a big single database. In this case you would maintain a database ID (e.g. four byte integer) that is prepended to the user provided key for all get and put operations. Only some care needs to be taken for range searches and cursor operations, as you might get a key/value pair that belongs to another logical database, but this is not a big deal. I use that approach for composite search keys quite a lot.

The association between database names and their IDs could be maintained in a separate database.

Regards,

Klaus

Muhammed Muneer

4:36 a.m.

New subject: mdb_dbi_open and threads

Hallvard wrote

"Currently a moderate number of slots are cheap but a huge number gets expensive: 7-120 words per transaction, and every #mdb_dbi_open() does a linear search of the opened slots."

I haven't seen a performance hit with around 10000 named databases. By the way, I was hoping to only open those dbi's on demand rather than opening all at iniatialization.

"With threads 1 and 2 coexisting? When thread 2 called mdb_dbi_open(), thread 1's prospect of using mdb_dbi_open() at all was lost."

Yeah with both coexisting. Thats what I thought.

@Klaus

Yeah. I know there can be only one write transactions. I was talking about 1 write and 1 or more read transactions. It is not as if I am first looking to open dbi in the read transaction. It is because I can't guarantee whether another read transaction will start and will attempt to open the same named dbi when a write is in progress.

"And first looking in a read transaction whether a database exists and then creating it in a second write transaction is definitely a bad and risky programming style, as it carries an assumption from one transaction to the next, which is typically not valid."

That was not what I tried to do.

"you still have the option to combine all your logical databases into a big single database"

Its a workaround that I haven't thought about before. Hoping to avoid the extra complexity.

Is there any prospect of implementing mdb_dbi_open or mdb_db_open_immediate to put the dbi into the shared environment without waiting for txn commit. I learned earlier from Howard Chu that it is not a wanted phenomenon in ACID. But just in case, because otherwise (without opening all the dbi's in initialization) in a multi-threaded environment, the possibility to open a dbi on demand ending in failure goes up.

On Mon, May 22, 2017 at 2:01 PM, Klaus Malorny Klaus.Malorny@knipp.de wrote:

...

On 5/21/17 9:43 PM, Muhammed Muneer wrote:

...
Howard Chu wrote

"Just follow the recommendation to open all handles at the beginning of the program."

But what if I have lots of named databases like maybe 10000 or more. Wouldn't this be expensive.

I am developing a MongoDB like database (similar in query and update syntax) around LMDB. The thing is I have some enhancements on my own like the ability to generate update queries from within an ongoing update.

So in a multi threaded environment, if the name of a named dbi is generated from within a write transaction (thread1) and proceeds to mdb_dbi_open it only to find that another read transaction (thread 2) just opened the same named dbi after the write-txn of thread 1 started, the prospect of mdb_dbi_open the same named dbi for thread 1 is lost forever.

Please remember that you can have only one writing transaction at once. And first looking in a read transaction whether a database exists and then creating it in a second write transaction is definitely a bad and risky programming style, as it carries an assumption from one transaction to the next, which is typically not valid.

I have no experience with a large number of databases, but if it is a performance problem as Hallvard and the docs describe, then you still have the option to combine all your logical databases into a big single database. In this case you would maintain a database ID (e.g. four byte integer) that is prepended to the user provided key for all get and put operations. Only some care needs to be taken for range searches and cursor operations, as you might get a key/value pair that belongs to another logical database, but this is not a big deal. I use that approach for composite search keys quite a lot.

The association between database names and their IDs could be maintained in a separate database.

Regards,

Klaus

Klaus Malorny

4:44 a.m.

New subject: mdb_dbi_open and threads

On 5/22/17 1:36 PM, Muhammed Muneer wrote:

...

Is there any prospect of implementing mdb_dbi_open or mdb_db_open_immediate to put the dbi into the shared environment without waiting for txn commit. I learned earlier from Howard Chu that it is not a wanted phenomenon in ACID. But just in case, because otherwise (without opening all the dbi's in initialization) in a multi-threaded environment, the possibility to open a dbi on demand ending in failure goes up.

I am still unsure what you are trying to achieve. If you are in a read transaction and discover that your database does not exist, what can you do anyway? You cannot create the database at this point, since it is a write operation. The documentation of mdb_dbi_open states:

MDB_CREATE Create the named database if it doesn't exist. This option is not allowed in a read-only transaction or a read-only environment.

Regards,

Klaus

2975

Age (days ago)

2977

Last active (days ago)

openldap-technical@openldap.org

6 comments

4 participants

tags (0)

participants (4)

Hallvard Breien Furuseth
Howard Chu
Klaus Malorny
Muhammed Muneer