Kris Zyp wrote:
Yes, I could set my kern.sysv.semmnu to a higher limit on that mac. But, this isn't an issue for me though, I've already addressed this in our software. I was testing that for the sake of other users though, and wondering if it was reasonable to expect users to know to alter kernel settings or build settings when getting invalid parameter errors. I thought maybe having something like a MDB_SEMAPHORES_FULL error useful for LMDB itself. But if not, that's fine.
If you're confident that POSIX semaphores work well on MacOS, then go ahead and submit a patch to change the default selection for MacOS. Thanks.
On Sat, Sep 12, 2020 at 8:52 AM Howard Chu hyc@symas.com wrote:
Kris Zyp wrote:
In LMDB, if you attempt to open more than 10 transactions (on different environments) on one process on MacOS, then mdb_txn_begin will fail. This can be reproduced by opening 11 different database environments in one process, and calling mdb_txn_begin (as write transactions) on each one. The 11th one will return EINVAL. I believe this is because (on this OS, MacOS 10.13.6) the System V semaphores have a limit of 10 SEM_UNDO locked semaphores on one process. So when mdb_txn_begin attempts to open an 11th transaction, the mdb_sem_wait/semop call fails, returning EINVAL. I was testing with latest LMDB from mdb.master.
Once I finally debugged this and figured out the issue, I have been able to work around it by compiling with MDB_USE_POSIX_SEM which seems to resolve the issue. But this still leaves a few questions:
Is there any issue with compiling with MDB_USE_POSIX_SEM on MacOS? Would it be better if LMDB defaulted to this shared lock implementation for this OS?
When we wrote the MacOS support there was no confirmation that POSIX semaphores worked on that OS at the time.
This limit in number of semaphores should be a tunable kernel parameter or ulimit setting, I'd look there first.
And/or would there be value in LMDB providing a more specific error message in this situation? The documentation doesn't indicate EINVAL as a possible return value for mdb_txn_begin, and this is a very generic error with little indication of the root problem, that was rather confusing to me, at least. Or is there an expectation that processes can only open a limited number of database environments (and have open transactions on them)? (Our server typically has about 30 environments open with about 2-16 dbs/env, with many concurrent transactions, without issue on other OSes.)
I would be happy to put together a patch for this, but I am not sure of the reasons for the selection of the different shared lock implementations on different OSes. Anyway, I'd be glad to help contribute a patch if there is a specific way this should work. Thank you!
-- -- Howard Chu CTO, Symas Corp. http://www.symas.com Director, Highland Sun http://highlandsun.com/hyc/ Chief Architect, OpenLDAP http://www.openldap.org/project/