https://bugs.openldap.org/show_bug.cgi?id=9397
--- Comment #10 from github@nicwatson.org github@nicwatson.org --- (In reply to Howard Chu from comment #8)
(In reply to tina from comment #7)
Hi,
I am the person who reported the original problem to Nic, after getting my DB corrupted the minute I started doing multiprocessing.
I wanted to comment on this:
(In reply to Howard Chu from comment #3)
1st: "Don't do that." The docs for mdb_env_mapsize() make it clear that it's up to the caller to ensure that no other txns are active at the time it is called. We can first expand this statement in the docs and say that the caller is responsible to ensure that no other *users* are active - processes, threads, txns, whatever.
I think that is far from clear in the documentation, and it seems to be many projects out there (including mine) doing dynamic resizing assuming that no other txns are active *in the current process*! I would urge you to update the documentation then, as this might be a DB corruption waiting to happen in many places.
There's no danger of corruption if you only resize to grow the DB, which is the only functionality you should need in an active application. Shrinking the DB should only be an administrative action, not a live runtime operation.
"only resize to grow the DB" is impossible without external synchronization. Without an external lock, there's always a window between checking the file size and truncating the file when another process might have done the same.
Imagine two processes A and B that both call mdb_env_set_mapsize at the same time on an environment pointing at the same file. Process A calls the function with a size of 20 MiB, process B calls it with a size of 10 MiB. Before the call, the file size was 5 MiB.
I believe the follow sequence is possible.
Process A: 0. mdb_env_set_mapsize begins
Process B: 1. 0. mdb_env_set_mapsize begins
Process A: 1. ftruncate (inside mdb_env_set_mapsize)(20MiB) 2. mmap(20MiB) (inside mdb_env_set_mapsize) 3. mdb_env_set_mapsize completes 4. mdb_txn_begin 5. mdb_cursor_put
Process B: 6. ftruncate(10MiB)
Process A: 7. mdb_txn_commit (bus error due to access past end of file)
If step 7 happens before step 6, then we have potential data loss and a corrupted DB if the process A transaction wrote data between the 10MiB and 20 MiB offsets in the file.
In other words, you can have two separate processes increasing the map size and still truncate real data from the file and/or bus error.