In LMDB, if you attempt to open more than 10 transactions (on different environments) on one process on MacOS, then mdb_txn_begin will fail. This can be reproduced by opening 11 different database environments in one process, and calling mdb_txn_begin (as write transactions) on each one. The 11th one will return EINVAL. I believe this is because (on this OS, MacOS 10.13.6) the System V semaphores have a limit of 10 SEM_UNDO locked semaphores on one process. So when mdb_txn_begin attempts to open an 11th transaction, the mdb_sem_wait/semop call fails, returning EINVAL. I was testing with latest LMDB from mdb.master.
Once I finally debugged this and figured out the issue, I have been able to work around it by compiling with MDB_USE_POSIX_SEM which seems to resolve the issue. But this still leaves a few questions:
Is there any issue with compiling with MDB_USE_POSIX_SEM on MacOS? Would it be better if LMDB defaulted to this shared lock implementation for this OS?
And/or would there be value in LMDB providing a more specific error message in this situation? The documentation doesn't indicate EINVAL as a possible return value for mdb_txn_begin, and this is a very generic error with little indication of the root problem, that was rather confusing to me, at least. Or is there an expectation that processes can only open a limited number of database environments (and have open transactions on them)? (Our server typically has about 30 environments open with about 2-16 dbs/env, with many concurrent transactions, without issue on other OSes.)
I would be happy to put together a patch for this, but I am not sure of the reasons for the selection of the different shared lock implementations on different OSes. Anyway, I'd be glad to help contribute a patch if there is a specific way this should work. Thank you!
Kris Zyp wrote:
In LMDB, if you attempt to open more than 10 transactions (on different environments) on one process on MacOS, then mdb_txn_begin will fail. This can be reproduced by opening 11 different database environments in one process, and calling mdb_txn_begin (as write transactions) on each one. The 11th one will return EINVAL. I believe this is because (on this OS, MacOS 10.13.6) the System V semaphores have a limit of 10 SEM_UNDO locked semaphores on one process. So when mdb_txn_begin attempts to open an 11th transaction, the mdb_sem_wait/semop call fails, returning EINVAL. I was testing with latest LMDB from mdb.master.
Once I finally debugged this and figured out the issue, I have been able to work around it by compiling with MDB_USE_POSIX_SEM which seems to resolve the issue. But this still leaves a few questions:
Is there any issue with compiling with MDB_USE_POSIX_SEM on MacOS? Would it be better if LMDB defaulted to this shared lock implementation for this OS?
When we wrote the MacOS support there was no confirmation that POSIX semaphores worked on that OS at the time.
This limit in number of semaphores should be a tunable kernel parameter or ulimit setting, I'd look there first.
And/or would there be value in LMDB providing a more specific error message in this situation? The documentation doesn't indicate EINVAL as a possible return value for mdb_txn_begin, and this is a very generic error with little indication of the root problem, that was rather confusing to me, at least. Or is there an expectation that processes can only open a limited number of database environments (and have open transactions on them)? (Our server typically has about 30 environments open with about 2-16 dbs/env, with many concurrent transactions, without issue on other OSes.)
I would be happy to put together a patch for this, but I am not sure of the reasons for the selection of the different shared lock implementations on different OSes. Anyway, I'd be glad to help contribute a patch if there is a specific way this should work. Thank you!
Yes, I could set my kern.sysv.semmnu to a higher limit on that mac. But, this isn't an issue for me though, I've already addressed this in our software. I was testing that for the sake of other users though, and wondering if it was reasonable to expect users to know to alter kernel settings or build settings when getting invalid parameter errors. I thought maybe having something like a MDB_SEMAPHORES_FULL error useful for LMDB itself. But if not, that's fine.
On Sat, Sep 12, 2020 at 8:52 AM Howard Chu hyc@symas.com wrote:
Kris Zyp wrote:
In LMDB, if you attempt to open more than 10 transactions (on different environments) on one process on MacOS, then mdb_txn_begin will fail. This can be reproduced by opening 11 different database environments in one process, and calling mdb_txn_begin (as write transactions) on each one. The 11th one will return EINVAL. I believe this is because (on this OS, MacOS 10.13.6) the System V semaphores have a limit of 10 SEM_UNDO locked semaphores on one process. So when mdb_txn_begin attempts to open an 11th transaction, the mdb_sem_wait/semop call fails, returning EINVAL. I was testing with latest LMDB from mdb.master.
Once I finally debugged this and figured out the issue, I have been able to work around it by compiling with MDB_USE_POSIX_SEM which seems to resolve the issue. But this still leaves a few questions:
Is there any issue with compiling with MDB_USE_POSIX_SEM on MacOS? Would it be better if LMDB defaulted to this shared lock implementation for this OS?
When we wrote the MacOS support there was no confirmation that POSIX semaphores worked on that OS at the time.
This limit in number of semaphores should be a tunable kernel parameter or ulimit setting, I'd look there first.
And/or would there be value in LMDB providing a more specific error message in this situation? The documentation doesn't indicate EINVAL as a possible return value for mdb_txn_begin, and this is a very generic error with little indication of the root problem, that was rather confusing to me, at least. Or is there an expectation that processes can only open a limited number of database environments (and have open transactions on them)? (Our server typically has about 30 environments open with about 2-16 dbs/env, with many concurrent transactions, without issue on other OSes.)
I would be happy to put together a patch for this, but I am not sure of the reasons for the selection of the different shared lock implementations on different OSes. Anyway, I'd be glad to help contribute a patch if there is a specific way this should work. Thank you!
-- -- Howard Chu CTO, Symas Corp. http://www.symas.com Director, Highland Sun http://highlandsun.com/hyc/ Chief Architect, OpenLDAP http://www.openldap.org/project/
Kris Zyp wrote:
Yes, I could set my kern.sysv.semmnu to a higher limit on that mac. But, this isn't an issue for me though, I've already addressed this in our software. I was testing that for the sake of other users though, and wondering if it was reasonable to expect users to know to alter kernel settings or build settings when getting invalid parameter errors. I thought maybe having something like a MDB_SEMAPHORES_FULL error useful for LMDB itself. But if not, that's fine.
If you're confident that POSIX semaphores work well on MacOS, then go ahead and submit a patch to change the default selection for MacOS. Thanks.
On Sat, Sep 12, 2020 at 8:52 AM Howard Chu hyc@symas.com wrote:
Kris Zyp wrote:
In LMDB, if you attempt to open more than 10 transactions (on different environments) on one process on MacOS, then mdb_txn_begin will fail. This can be reproduced by opening 11 different database environments in one process, and calling mdb_txn_begin (as write transactions) on each one. The 11th one will return EINVAL. I believe this is because (on this OS, MacOS 10.13.6) the System V semaphores have a limit of 10 SEM_UNDO locked semaphores on one process. So when mdb_txn_begin attempts to open an 11th transaction, the mdb_sem_wait/semop call fails, returning EINVAL. I was testing with latest LMDB from mdb.master.
Once I finally debugged this and figured out the issue, I have been able to work around it by compiling with MDB_USE_POSIX_SEM which seems to resolve the issue. But this still leaves a few questions:
Is there any issue with compiling with MDB_USE_POSIX_SEM on MacOS? Would it be better if LMDB defaulted to this shared lock implementation for this OS?
When we wrote the MacOS support there was no confirmation that POSIX semaphores worked on that OS at the time.
This limit in number of semaphores should be a tunable kernel parameter or ulimit setting, I'd look there first.
And/or would there be value in LMDB providing a more specific error message in this situation? The documentation doesn't indicate EINVAL as a possible return value for mdb_txn_begin, and this is a very generic error with little indication of the root problem, that was rather confusing to me, at least. Or is there an expectation that processes can only open a limited number of database environments (and have open transactions on them)? (Our server typically has about 30 environments open with about 2-16 dbs/env, with many concurrent transactions, without issue on other OSes.)
I would be happy to put together a patch for this, but I am not sure of the reasons for the selection of the different shared lock implementations on different OSes. Anyway, I'd be glad to help contribute a patch if there is a specific way this should work. Thank you!
-- -- Howard Chu CTO, Symas Corp. http://www.symas.com Director, Highland Sun http://highlandsun.com/hyc/ Chief Architect, OpenLDAP http://www.openldap.org/project/
On Sun, Sep 13, 2020 at 11:19 AM Howard Chu hyc@symas.com wrote:
If you're confident that POSIX semaphores work well on MacOS, then go ahead and submit a patch to change the default selection for MacOS. Thanks.
After reading up on this, I thought that the issue with POSIX semaphores as default as that they weren't robust, so they might not necessarily be a preferred over SysV semaphores (choosing between higher default semaphore limit and process robust/cleanup might be an application-specific decision)? Or does LMDB have some type of cleanup mechanism for POSIX semaphores? Thanks, Kris
If you're confident that POSIX semaphores work well on MacOS, then go ahead and submit a patch to change the default selection for MacOS. Thanks.
Here are a couple of small possible patches for consideration (based on mdb.master3 branch): This is a trivial fix for several compilation errors (on Windows) due to unknown size of void on the mdb.master3 branch: https://github.com/kriszyp/lmdb/commit/12f1bdf9be5052728694eb5fa222688f50f84...
And then this patch makes POSIX semaphores the default on MacOS unless robust semaphores are specified (in which case it will go back to SysV semaphores): https://github.com/kriszyp/lmdb/commit/0971512f6a4aa5b05f99e481a6687d4802f70...
Thanks, Kris
Kris Zyp wrote:
If you're confident that POSIX semaphores work well on MacOS, then go ahead and submit a patch to change the default selection for MacOS. Thanks.
Please open an issue in the bug tracker and attach your patches there.
Here are a couple of small possible patches for consideration (based on mdb.master3 branch): This is a trivial fix for several compilation errors (on Windows) due to unknown size of void on the mdb.master3 branch: https://github.com/kriszyp/lmdb/commit/12f1bdf9be5052728694eb5fa222688f50f84...
And then this patch makes POSIX semaphores the default on MacOS unless robust semaphores are specified (in which case it will go back to SysV semaphores): https://github.com/kriszyp/lmdb/commit/0971512f6a4aa5b05f99e481a6687d4802f70...
Thanks, Kris