Hello List,
The LMDB documentation says the following in its section on caveats:
- Use an MDB_env* in the process which opened it, without fork()ing.
- Do not have open an LMDB database twice in the same process at the same time. Not even from a plain open() call - close()ing it breaks flock() advisory locking.
This seems contrary to an earlier thread on this list (1), which suggests that fork/execing a process using LMDB is OK so long as the MDB_env is not used in the forked process. Looking at the flock man pages on FreeBSD and Linux tells me that this indeed should be ok: an flock is released only when all fds pointing to the open file table entry are closed (ignoring explicit unlock). Exec with FD_CLOEXEC set should therefore be OK.
Is my interpretation correct? I want to use this to implement graceful restarts in a daemon which uses LMDB:
* mdb_env_open() in old process * fork() -> exec() the daemon itself * mdb_env_open() in new process * mdb_env_close() in old process
If this works, I'd like to contribute the changes necessary to not leak fds on exec, which are mentioned in the other thread.
* Are there contribution guidelines somewhere? How do I submit a patch? * Seems like there is currently no call to SetHandleInformation with HANDLE_FLAG_INHERIT=0 for Winows, should that be added?
Best, Lorenz
1: http://www.openldap.org/lists/openldap-technical/201403/msg00149.html
On 22/08/16 12:47, Lorenz Bauer wrote:
The LMDB documentation says the following in its section on caveats:
- Use an MDB_env* in the process which opened it, without fork()ing.
- Do not have open an LMDB database twice in the same process at the same time. Not even from a plain open() call - close()ing it breaks flock() advisory locking.
This seems contrary to an earlier thread on this list (1), which suggests that fork/execing a process using LMDB is OK so long as the MDB_env is not used in the forked process.
It's just poorly worded. The point of the first sentence is, "do not use the MDB_env after forking". It's OK to fork if you do not use it afterwards. A patch with a better wording might be in order:-)
Also I'm guessing that restriction is only relevant on Unix: Hopefully Windows locking doesn't have flock's insane semantics. I don't know Windows though.
(...)
- Seems like there is currently no call to SetHandleInformation with
HANDLE_FLAG_INHERIT=0 for Winows, should that be added?
MDB was originally written with Unix in mind. I'm guessing the "no multiple handles" restriction is only relevant on Unix: Hopefully Windows locking doesn't have flock's insane semantics. But I don't know Windows.
OTOH it won't hurt to add close-on-fork/exec flags for everything, not just the Unix lockfile descriptor. Would need to factor the opens out to a separate function to avoid excessive code and #ifdef duplication, I never got around to it. (Or rather, I drafted something but it got complicated and I didn't want to spend time testing it on Windows.)
Regarding your next message:
The unix version only uses O_DIRECT if psize >= OS psize because O_DIRECT typically requires alignment on OS page boundaries, or something like that. Should be commented. Didn't find anything similar in the Windows doc, but again, I don't know Windows. Maybe Howard knows more.
On 24 August 2016 at 06:27, Hallvard Breien Furuseth < h.b.furuseth@usit.uio.no> wrote:
On 22/08/16 12:47, Lorenz Bauer wrote:
The LMDB documentation says the following in its section on caveats:
- Use an MDB_env* in the process which opened it, without fork()ing.
- Do not have open an LMDB database twice in the same process at the
same time. Not even from a plain open() call - close()ing it breaks flock() advisory locking.
This seems contrary to an earlier thread on this list (1), which suggests that fork/execing a process using LMDB is OK so long as the MDB_env is not used in the forked process.
It's just poorly worded. The point of the first sentence is, "do not use the MDB_env after forking". It's OK to fork if you do not use it afterwards. A patch with a better wording might be in order:-)
Happy to do that.
MDB was originally written with Unix in mind. I'm guessing the "no multiple handles" restriction is only relevant on Unix: Hopefully Windows locking doesn't have flock's insane semantics. But I don't know Windows.
According to the CreateFileW docs, calling the it without a security descriptor (which is what LMDB does) makes it behave like O_CLOEXEC. So it's only the POSIX code that is affected.
OTOH it won't hurt to add close-on-fork/exec flags for everything, not
just the Unix lockfile descriptor. Would need to factor the opens out to a separate function to avoid excessive code and #ifdef duplication,
I've got a patch for this which I'll submit once I get approval.
Regarding your next message:
The unix version only uses O_DIRECT if psize >= OS psize because O_DIRECT typically requires alignment on OS page boundaries, or something like that. Should be commented. Didn't find anything similar in the Windows doc, but again, I don't know Windows. Maybe Howard knows more.
According to man 2 open O_DIRECT alignment is file system specific from 2.6 onwards, but "usually" 512bytes FWIW. Not sure how that would affect this code.
On 24. aug. 2016 11:02, Lorenz Bauer wrote:
On 24 August 2016 at 06:27, Hallvard Breien Furuseth <h.b.furuseth@usit.uio.no mailto:h.b.furuseth@usit.uio.no> wrote: The unix version only uses O_DIRECT if psize >= OS psize because O_DIRECT typically requires alignment on OS page boundaries, or something like that. Should be commented. Didn't find anything similar in the Windows doc, but again, I don't know Windows. Maybe Howard knows more.
According to man 2 open O_DIRECT alignment is file system specific from 2.6 onwards, but "usually" 512bytes FWIW. Not sure how that would affect this code.
I gather you mean Linux 2.6. But O_DIRECT is not Linux-specific, I don't know of a portable way to ask "is this alignment big enough?", and the code breaks if we use O_DIRECT when the OS doesn't like it. So we simply assume the OS page size is OK.
On 24 August 2016 at 10:30, Hallvard Breien Furuseth < h.b.furuseth@usit.uio.no> wrote:
On 24. aug. 2016 11:02, Lorenz Bauer wrote:
According to man 2 open O_DIRECT alignment is file system specific from 2.6 onwards, but "usually" 512bytes FWIW. Not sure how that would affect this code.
I gather you mean Linux 2.6. But O_DIRECT is not Linux-specific, I don't know of a portable way to ask "is this alignment big enough?", and the code breaks if we use O_DIRECT when the OS doesn't like it. So we simply assume the OS page size is OK.
Yes, that makes sense of course. I googled around for documentation on some of the more obscure Unixes, but it is hard to come by.
I don't have legal sign-off yet and am abroad until the beginning of September. I'll pick up submitting the patch then.
I've cleaned up some old tweaks of mine which, er, I still haven't checked properly on non-Linux. Might be easier to start there: Branch "mdb/fopen" in repo git://git.uio.no/u/hbf/openldap.git.
The main point is "[DRAFT] Move opening files out to mdb_fopen()". "Factoring out" the file handling doesn't mean less code, but it does collect all the #ifdef ugliness in one place. Using mdb_fname_init() saves a few mallocs on Windows, and some malloced bytes on Unix.
Howard, do we want that, or did I get too fancy? :-)
On 31. aug. 2016 10:06, Hallvard Breien Furuseth wrote:
I've cleaned up some old tweaks of mine which, er, I still haven't checked properly on non-Linux. Might be easier to start there: Branch "mdb/fopen" in repo git://git.uio.no/u/hbf/openldap.git.
Updated the branch.
On 7 September 2016 at 10:49, Hallvard Breien Furuseth < h.b.furuseth@usit.uio.no> wrote:
On 31. aug. 2016 10:06, Hallvard Breien Furuseth wrote:
I've cleaned up some old tweaks of mine which, er, I still haven't checked properly on non-Linux. Might be easier to start there: Branch "mdb/fopen" in repo git://git.uio.no/u/hbf/openldap.git.
Updated the branch.
Sorry for the long silence, I have the sign-off that I can contribute the patches, but now I'm waiting on legal to approve the legalese.
That said, I had a look at your branch yestderday, it's pretty much what I have as well. Main difference is that I do not touch the utf8 conversion stuff, and the enum I added describes the desired properties of the file (R / W / DIRECT) rather than the use case. I think I like your approach with MDB_O_COPY, etc. better.
Would it make sense to split the main commit into two for fname and fopen changes? Also, I find the flags = which == MDB_O_* rather hard to follow. Maybe that is better as a switch?
Lorenz
On 07. sep. 2016 12:22, Lorenz Bauer wrote:
Would it make sense to split the main commit into two for fname and fopen changes?
Go ahead, or I'll do it later. Will need to add a function mdb_fname_suffix_set() which disappears again in the next commit (merged into mdb_fopen).
Also, I find the flags = which == MDB_O_* rather hard to follow. Maybe that is better as a switch?
Well. It was supposed to be formatted to be easy to read. But note that lmdb (and openldap) code uses tab-width = 4, not 8. Same as indentation.
Anyway, I've repaced one tab with 4 spaces so it lines up either way, and replaced the Windows if-else with switch while I was at it. git filter-branch is nice:-)
Duh - the mdb_env_copy() FD should have FD_CLOEXEC, since the point was the user can't get at it. Fixed after an off-list discussion. OTOH MDB_env.me_fd should not: mdb_env_get_fd() exposes it, so CLOEXEC would break existing programs.
Another issue is with fork() without exec(). This breaks LMDB: if (fork() == 0) pthread_exit(NULL); because the child's me_txkey destructor releases the reader slot.
This can happen if a single-threaded LMDB process forks a multi- threaded child, which is one way to deal with threads vs. fork().
pthread_atfork() is the official way to deal with that. I'm a bit wary of it since there is no way to unregister an atfork handler. I wonder what happens if the LMDB module was dynamically loaded and then gets unloaded. A simpler way is in any case to check if (reader->mr_pid == getpid()) in mdb_env_reader_dest().
I suppose there could be an mdb_env_fork_cleanup() function which at least closes the FDs, and takes an arg to say whether to free memory too (safe if parent was single-threaded).
openldap-technical@openldap.org