https://bugs.openldap.org/show_bug.cgi?id=9619
Issue ID: 9619
Summary: mdb_env_copy2 with MDB_CP_COMPACT in mdb.master3
produces corrupt mdb file
Product: LMDB
Version: 0.9.29
Hardware: All
OS: Windows
Status: UNCONFIRMED
Severity: normal
Priority: ---
Component: liblmdb
Assignee: bugs(a)openldap.org
Reporter: kriszyp(a)gmail.com
Target Milestone: ---
When copying an LMDB database with mdb_env_copy2 with the MDB_CP_COMPACT with
mdb.master3, the resulting mdb file seems to be corrupt and when using it in
LMDB, I get segmentation faults. Copying without the compacting flag seems to
work fine. I apologize, I know this is not a very good issue report, as I
haven't had a chance to actually narrow this down to a more
reproducible/isolated case, or look for how to patch. I thought I would report
in case there are any ideas on what could cause this. The segmentation faults
always seem to be memory write faults (as opposed to try fault on trying to
read). Or perhaps the current backup/copying functionality is eventually going
to be replaced by incremental backup/copying anyway
(https://twitter.com/hyc_symas/status/1315651814096875520). I'll try to update
this if I get a chance to investigate more, but otherwise feel free to
ignore/consider low-priority since the work around is easy.
--
You are receiving this mail because:
You are on the CC list for the issue.
https://bugs.openldap.org/show_bug.cgi?id=10346
Issue ID: 10346
Summary: mdb_env_copy2 on a database with a value larger than
(2GB-16) results in a corrupt copy
Product: LMDB
Version: 0.9.31
Hardware: x86_64
OS: Linux
Status: UNCONFIRMED
Keywords: needs_review
Severity: normal
Priority: ---
Component: liblmdb
Assignee: bugs(a)openldap.org
Reporter: mike.moritz(a)vertex.link
Target Milestone: ---
Created attachment 1072
--> https://bugs.openldap.org/attachment.cgi?id=1072&action=edit
reproduction source code
Running mdb_env_copy2 with compaction on a database with a value larger than
(2GB-16)bytes appears to complete successfully in that there are no errors, but
the copied database cannot be opened and throws an MDB_CORRUPTED error. Looking
at the copied database size, it appears that the value is either being skipped
or significantly truncated. Running mdb_env_copy2 without compaction also
completes successfully, and the copied database can be opened.
I initially encountered this while using py-lmdb with v0.9.31 of LMDB, but was
able to write up a simple script that uses the library directly. The source for
the script is attached, and the results below are from running it with the
latest from master.
Without compaction:
$ ./lmdb_repro test.lmdb $((2 * 1024 * 1024 * 1024 - 16 + 1)) testbak.lmdb
LMDB Version: LMDB 0.9.70: (December 19, 2015)
Set LMDB map size to 21474836330 bytes
Successfully inserted key with 2147483633 bytes of zero-filled data
Retrieved 2147483633 bytes of data
First 16 bytes (hex): 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ...
Copying database to testbak.lmdb...
Database copy completed successfully.
Opening copied database and reading value...
Retrieved 2147483633 bytes of data from copied database
First 16 bytes from copy (hex): 00 00 00 00 00 00 00 00 00 00 00 00 00 00
00 00 ...
Data size matches between original and copy
With compaction:
$ ./lmdb_repro -c test.lmdb $((2 * 1024 * 1024 * 1024 - 16 + 1))
testbak.lmdb
LMDB Version: LMDB 0.9.70: (December 19, 2015)
Set LMDB map size to 21474836330 bytes
Successfully inserted key with 2147483633 bytes of zero-filled data
Retrieved 2147483633 bytes of data
First 16 bytes (hex): 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ...
Copying database to testbak.lmdb (with compaction)...
Database copy completed successfully.
Opening copied database and reading value...
mdb_get (copy) failed: MDB_CORRUPTED: Located page was wrong type
Size difference on corrupt DB:
$ du -sh ./*
312K ./lmdb_repro
24K ./testbak.lmdb
2.1G ./test.lmdb
With compaction at the perceived max size:
$ ./lmdb_repro -c test.lmdb $((2 * 1024 * 1024 * 1024 - 16)) testbak.lmdb
LMDB Version: LMDB 0.9.70: (December 19, 2015)
Set LMDB map size to 21474836320 bytes
Successfully inserted key with 2147483632 bytes of zero-filled data
Retrieved 2147483632 bytes of data
First 16 bytes (hex): 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ...
Copying database to testbak.lmdb (with compaction)...
Database copy completed successfully.
Opening copied database and reading value...
Retrieved 2147483632 bytes of data from copied database
First 16 bytes from copy (hex): 00 00 00 00 00 00 00 00 00 00 00 00 00 00
00 00 ...
Data size matches between original and copy
--
You are receiving this mail because:
You are on the CC list for the issue.
https://bugs.openldap.org/show_bug.cgi?id=10296
Issue ID: 10296
Summary: Force a Mac OS full flush
Product: LMDB
Version: unspecified
Hardware: All
OS: Mac OS
Status: UNCONFIRMED
Keywords: needs_review
Severity: normal
Priority: ---
Component: liblmdb
Assignee: bugs(a)openldap.org
Reporter: renault.cle(a)gmail.com
Target Milestone: ---
Created attachment 1045
--> https://bugs.openldap.org/attachment.cgi?id=1045&action=edit
Use a F_FULLFSYNC fcntl when committing on Mac OS
Hello Howard and Happy New Year,
As discussed in this issue [1], the LMDB durability is incorrect when
committing. I propose the following patch that uses fcntl with the F_FULLFSYNC
flag. The fcntl documentation is on this page [2].
Note that I kept the calls to msync/fsync for simplicity and because they don't
cost much but feel free to skip them on Mac OS.
Have a nice day,
kero
[1]: https://github.com/cberner/redb/pull/928#issuecomment-2567032808
[2]:
https://developer.apple.com/library/archive/documentation/System/Conceptual…
--
You are receiving this mail because:
You are on the CC list for the issue.
https://bugs.openldap.org/show_bug.cgi?id=10454
Issue ID: 10454
Summary: O_DSYNC is busted on macos
Product: LMDB
Version: unspecified
Hardware: All
OS: Mac OS
Status: UNCONFIRMED
Keywords: needs_review
Severity: normal
Priority: ---
Component: liblmdb
Assignee: bugs(a)openldap.org
Reporter: pyry.kovanen(a)gmail.com
Target Milestone: ---
LMDB relies on O_DSYNC for writing the meta page, unfortunately it doesn't work
on macos. Previous discovery by the tigerbeetle guys:
https://github.com/tigerbeetle/viewstamped-replication-made-famous#leaderbo…,
some more context at https://x.com/jorandirkgreef/status/1532314169604726784.
I discovered this during benchmarking and was wondering why lmdb writes were
twice as fast as macos as on linux.
--
You are receiving this mail because:
You are on the CC list for the issue.
https://bugs.openldap.org/show_bug.cgi?id=10402
Issue ID: 10402
Summary: Feature request: parameter for mdb_env_copy to exclude
databases
Product: LMDB
Version: unspecified
Hardware: All
OS: All
Status: UNCONFIRMED
Keywords: needs_review
Severity: normal
Priority: ---
Component: liblmdb
Assignee: bugs(a)openldap.org
Reporter: jeffro256(a)tutanota.com
Target Milestone: ---
## Desired behavior
New overload or API modification to `mdb_env_copy{...}()` which allows passing
a list of names of databases to exclude from the environment copy.
## Example use-case
The Monero blockchain database [1] has the option to derived a "pruned"
database for space-saving purposes. This removes some ~60% of data which some
users may find non-essential. Currently, the pruning code [2] copies each
non-pruned table manually, specifying key ordering functions, DB flags, etc. It
then drops some entries from the relevant "prunable tables". This, however,
adds technical maintenance debt when databases are added/updated. A preferable
alternative would be to write high-level modification code once and use an
overload of `mdb_env_copy` which excludes copying database that we know in
advance we don't want to copy. Then our pruning utility would work agnostic to
database changes.
## Why
Adding this functionality ourselves would involve re-writing large portions of
`mdb_env_copyfd{0,1}` and `mdb_env_cwalk`, which requires either A) vendoring
LMDB, or B) possibly breaking in future updates.
You may understandably be of the opinion that the maintenance burden is an "us
problem", and not deem pursuing this feature request worth it, but hopefully
you seem the value in this utility. Thanks for y'alls hard work on LMDB.
## Links
[1]
https://github.com/monero-project/monero/blob/d32b5bfe18e2f5b979fa8dc3a8966…
[2]
https://github.com/monero-project/monero/blob/master/src/blockchain_utiliti…
--
You are receiving this mail because:
You are on the CC list for the issue.
https://bugs.openldap.org/show_bug.cgi?id=10108
Issue ID: 10108
Summary: "mdb_dump -a" does not dump the main database
Product: LMDB
Version: 0.9.29
Hardware: All
OS: All
Status: UNCONFIRMED
Keywords: needs_review
Severity: normal
Priority: ---
Component: liblmdb
Assignee: bugs(a)openldap.org
Reporter: tuukka.pensala(a)gmail.com
Target Milestone: ---
In mdb_dump.c we have these instructions:
/* -a: dump main DB and all subDBs
* -s: dump only the named subDB
* -n: use NOSUBDIR flag on env_open
* -p: use printable characters
* -f: write to file instead of stdout
* -V: print version and exit
* (default) dump only the main DB
*/
However, contrary to the description, the option -a does not dump the main DB.
With argument -a "dumpit(..)" is called for the named databases, but not for
the unnamed one.
With the current behavior, if the data store contains subDBs and has user-added
data in the main DB, there seems to be no way to dump all of it at once using
mdb_dump.
--
You are receiving this mail because:
You are on the CC list for the issue.
https://bugs.openldap.org/show_bug.cgi?id=10236
Issue ID: 10236
Summary: fragmentation makes mdb_page_alloc slow
Product: LMDB
Version: 0.9.31
Hardware: All
OS: All
Status: UNCONFIRMED
Keywords: needs_review
Severity: normal
Priority: ---
Component: liblmdb
Assignee: bugs(a)openldap.org
Reporter: aalekseyev(a)janestreet.com
Target Milestone: ---
Created attachment 1022
--> https://bugs.openldap.org/attachment.cgi?id=1022&action=edit
patch is relative to LMDB_0.9.31
It's a known problem that mdb_page_alloc can be slow
when the free list is large and fragmented. [1] [2] [3]
I'm not sure it's known *how* slow it can be.
In our workload we saw a fragmented freelist leading
to a pathological O(n^2) behavior.
To handle a multi-page allocation we iterate loading chunks of the
free list one by one, and at every iteration we do O(n) work to check
if the allocation can succeed.
Even small-ish allocations (tens of pages) are repeatedly hitting
this edge case, with free list growing to ~1000000, and the outer loop
taking ~2000 iterations (10^9 worth of work in total, just to allocate a
few pages).
Even though I'm sure there are ways to avoid hitting this pathological
scenario so much (avoid values larger than 4k, or fix whatever causes
fragmentation), it seems unacceptable to have a performance cliff this bad.
I made a patch to make the allocation take ~O(n*log(n)), by loading
and merging multiple chunks at once instead of doing it one-by-one.
I'd appreciate it if someone could review the patch (attached), improve it,
and/or come up with an alternative fix.
The code in `midl.c` is kinda meme-y, including a contribution from GPT-4o, but
it performs well enough to speed up our pathological workload by ~20x (which is
still ~3x away from the non-fragmented case).
Anyway, the main thing that warrants scrutiny is the change in `mdb.c`:
I understand very little about lmdb internals and I worry that loading
multiple pages at once instead of one-by-one might break something.
[1] issue #8664
[2]
https://lists.openldap.org/hyperkitty/list/openldap-bugs@openldap.org/threa…
[3]
https://lists.openldap.org/hyperkitty/list/openldap-technical@openldap.org/…
--
You are receiving this mail because:
You are on the CC list for the issue.
https://bugs.openldap.org/show_bug.cgi?id=9434
Issue ID: 9434
Summary: Abysmal write performance with certain data patterns
Product: LMDB
Version: 0.9.24
Hardware: x86_64
OS: Linux
Status: UNCONFIRMED
Severity: normal
Priority: ---
Component: liblmdb
Assignee: bugs(a)openldap.org
Reporter: tina(a)tina.pm
Target Milestone: ---
Created attachment 784
--> https://bugs.openldap.org/attachment.cgi?id=784&action=edit
Monitoring graph of disk usage
Hi,
I have recently written a project for a customer which relies heavily on LMDB,
in which performance is critical. Sadly, after completing the project I started
having all kinds of problems when the DB started to grow. This has gotten so
bad the project release had to be postponed, and I have been asked to rewrite
the DB layer using a different engine, unless I can find some solution quickly.
I have so far found 4 serious issues, which I suspect are related either to the
size of the database or to the patterns of the data:
* Writing a value in some of the subdatabases has become increasingly slower,
and commits are taking way too long to complete. This is running on a powerful
computer with SSDs, and the 95% percentile of commits is at around 400ms. The
single-writer limitation meant that I have run out of optimisations to try.
* For some reason I cannot understand, the disk usage has grown to over 2x the
size of the actual data stored, and the free space does not seem to be
reclaimed. The file takes up 348 GB, while the used pages amount to only 162
GB.
* A couple of days ago it had a sudden spike in disk usage (not correlated to
increases in actual data stored, or even to the last pageno user) that filled
the disk in a couple of hours. You can see this in the attached captures of the
monitoring graphs which show actual disk usage (bottom) and counts of pages as
reported by LMDB (top). The bottom graph is total disk usage, although the
partition is almost exclusively the database, but ignore the few dips in size
which are from removing other stuff.
* Running `mdb_dump` for backups takes up to 7 hours for the database; restores
are totally useless: I tried to re-create the database after the weird space
spike and had to stopped after 24h when not even 30% of the data ad been
restored! This alone is a deal-breaker, as we have no usable way to backup and
restore the database.
For context, this is the mdb_stat output with descriptions of each subdatabase.
I have no explanation for the ridiculous amount of free pages, and even running
mdb_stat takes a few seconds:
Environment Info
Map address: (nil)
Map size: 397166026752
Page size: 4096
Max pages: 96964362
Number of pages used: 90991042
Last transaction ID: 14647267
Max readers: 126
Number of readers used: 4
Freelist Status
Tree depth: 3
Branch pages: 26
Leaf pages: 5168
Overflow pages: 74319
Entries: 111981
Free pages: 36352392
Status of Main DB
Tree depth: 1
Branch pages: 0
Leaf pages: 1
Overflow pages: 0
Entries: 8
Status of audit_log
Tree depth: 4
Branch pages: 309
Leaf pages: 69154
Overflow pages: 6082343
Entries: 2061655
* Audit log: MDB_INTEGERKEY, big values (12kb av). Append only, few reads.
Status of audit_idx
Tree depth: 4
Branch pages: 261
Leaf pages: 27310
Overflow pages: 0
Entries: 2006963
* Audit index 1: 40 byte keys, 8 byte values. Append only, it has less records
as I disabled it yesterday due to its impact on performance.
Status of time_idx
Tree depth: 3
Branch pages: 22
Leaf pages: 4611
Overflow pages: 0
Entries: 2061655
* Audit index 2: MDB_INTEGERKEY, MDB_DUPSORT, MDB_DUPFIXED; 40 byte values.
Append only.
Status of item_db
Tree depth: 4
Branch pages: 132
Leaf pages: 10040
Overflow pages: 0
Entries: 186291
* Main data store: 40 byte keys, small values (220b avg). Lots of reads and new
records, very few deletes and no updates.
Status of user_state_db
Tree depth: 5
Branch pages: 83283
Leaf pages: 9289578
Overflow pages: 32
Entries: 207894432
* User state: 20-40 byte keys, small values (180b avg), *many* entries. Lots
and reads and updates.
Status of item_users_idx
Tree depth: 4
Branch pages: 203
Leaf pages: 16532
Overflow pages: 0
Entries: 1035586217
* User / data matrix index: MDB_DUPSORT; 40 byte keys, 20-40 byte values,
*really big*. Lots of writes, very few deletes and no updates.
Status of user_log
Tree depth: 5
Branch pages: 361275
Leaf pages: 26570347
Overflow pages: 0
Entries: 1035586217
* User log: 30-50 byte keys, small values (100b avg), 1e9 records. Append only,
very few reads. I had to stop the restore operation while this was being
recreated, because after 24h only 50% the entries had been restored. Thanks to
monitoring, I measured this maxing out at 7000 entries per second; the other
databases showed way slower rates than this!
Any help would be really appreciated!
Thanks. Tina.
--
You are receiving this mail because:
You are on the CC list for the issue.
https://bugs.openldap.org/show_bug.cgi?id=9316
Issue ID: 9316
Summary: performance issue when writing a high number of large
objects
Product: LMDB
Version: 0.9.24
Hardware: x86_64
OS: Linux
Status: UNCONFIRMED
Severity: normal
Priority: ---
Component: liblmdb
Assignee: bugs(a)openldap.org
Reporter: JGabler(a)univa.com
Target Milestone: ---
Created attachment 755
--> https://bugs.openldap.org/attachment.cgi?id=755&action=edit
lmdb performance test reproducing the issue
When writing a high number of big objects we see an extreme variation in
performance from very fast to extremely slow.
In the test scenario we write 10 chunks of 10.000 "jobs" (some 10kB) and their
corresponding "job script" (some 40kB), 200.000 objects in total.
Then delete all objects.
We do 10 iterations of this scenario.
When running this scenario as part of Univa Grid Engine with LMDB as database
backend we get the following performance values (rows are the iteration,
columns the chunk of jobs):
Iteration 0 1 2 3 4 5 6 7 8 9
0 21.525 21.250 21.574 21.722 22.693 21.992 22.438 22.650
21.972 22.017
1 22.262 21.656 22.339 22.914 21.549 24.906 23.862 1531.189
1695.041 1491.255
2 36.071 21.619 22.074 22.927 23.455 27.239 22.640 22.802
633.956 1882.008
3 52.163 21.651 21.571 22.686 22.727 22.024 40.980 22.156
22.429 595.362
4 64.977 21.511 22.519 22.148 22.354 23.292 57.740 20.835
37.680 250.594
5 54.724 21.074 21.200 23.744 22.109 21.351 62.225 21.447
91.292 375.260
6 49.065 21.573 22.309 26.084 21.226 21.248 68.580 22.531
59.338 249.936
7 44.666 21.830 21.009 28.760 21.533 21.611 72.291 23.144
86.281 118.326
8 35.486 21.720 21.840 24.729 22.045 20.877 76.473 21.193
120.387 136.836
9 41.159 23.365 21.721 23.024 21.835 20.972 77.409 21.784
193.885 306.158
So usually writing of 10.000 "jobs"+"job_script" takes some 22 seconds but
after some time performance massively breaks in.
With other database backends we do not see this behaviour, see the following
performance data of the same test done with PostgreSQL backend which is slower
(as expected going over the network) but provides constant throughput:
Iteration 0 1 2 3 4 5 6 7 8 9
0 36.937 37.110 36.952 37.279 37.580 37.364 37.950 37.390
37.682 37.439
1 37.464 38.110 37.679 38.366 37.576 37.624 37.476 37.412
37.265 37.727
2 36.394 37.635 37.347 37.603 37.402 37.515 37.802 37.898
37.355 37.939
3 37.213 37.539 36.771 37.706 37.055 37.780 37.283 37.488
36.955 37.460
4 36.554 37.557 37.368 37.960 37.070 37.892 37.459 37.857
37.228 37.833
5 37.047 38.164 37.167 37.885 37.268 37.676 37.355 37.572
37.347 37.569
6 37.118 37.735 36.857 37.602 36.717 37.716 37.444 37.685
37.085 38.151
7 36.787 37.647 36.844 37.601 36.934 37.440 37.632 37.291
37.174 37.926
8 36.884 37.560 37.117 37.239 37.034 37.748 37.289 37.635
36.822 37.693
9 37.178 37.496 36.849 37.799 37.289 37.644 37.461 37.622
37.022 37.670
We can reproduce the issue with a small C program (see attachment) which does
essentially the same database operations as our database layer in Univa Grid
Engine but depends only on liblmdb.
It simulates the scenario described above and gives us the following
performance data
showing the extreme performance variation:
Iteration 0 1 2 3 4 5 6 7 8 9
0 0.686 0.625 0.660 0.637 0.631 0.741 0.757 0.658
0.651 0.614
1 0.705 0.838 0.690 0.772 0.663 3.248 0.605 542.762
1114.374 898.477
2 13.336 1.299 0.659 0.637 0.626 0.712 11.172 0.663
29.833 1161.884
3 26.774 0.647 0.607 0.586 0.583 0.639 24.893 0.629
3.837 423.248
4 32.802 0.629 0.616 0.560 0.550 0.605 31.133 0.625
6.606 195.150
5 34.819 0.623 0.628 0.582 0.564 0.609 32.275 0.607
7.599 134.106
6 26.319 0.622 0.582 0.548 0.551 0.590 28.536 0.611
36.429 160.781
7 21.878 0.814 0.668 0.736 0.614 0.543 24.355 0.626
36.583 148.337
8 4.129 0.654 5.674 0.596 0.566 0.554 7.158 0.633
0.599 48.799
9 30.278 0.608 0.608 0.560 0.549 0.587 29.253 0.606
9.593 128.339
It can be compiled on Linux 64bit with
gcc -I <path to lmdb>/include -L <path to lmdb>/lib -o test_lmdb_perf
test_lmdb_perf.c -llmdb
To run the given scenario call it with the following parameters:
./test_lmdb_perf <path to database directory> 10 10 10000
We built and ran it on
- CentOS Linux release 7.7.1908 (Core)
- Linux biber 3.10.0-1062.9.1.el7.x86_64 #1 SMP Fri Dec 6 15:49:49 UTC 2019
x86_64 x86_64 x86_64 GNU/Linux
- it was built with gcc (GCC) 7.2.1 20170829 (Red Hat 7.2.1-1) from
devtoolset-7
--
You are receiving this mail because:
You are on the CC list for the issue.
https://bugs.openldap.org/show_bug.cgi?id=9278
Issue ID: 9278
Summary: liblmdb: robust mutexes should not be unmapped
Product: LMDB
Version: unspecified
Hardware: All
OS: FreeBSD
Status: UNCONFIRMED
Severity: normal
Priority: ---
Component: liblmdb
Assignee: bugs(a)openldap.org
Reporter: delphij(a)freebsd.org
Target Milestone: ---
Created attachment 736
--> https://bugs.openldap.org/attachment.cgi?id=736&action=edit
A possible workaround
We recently noticed that lmdb would have the memory region containing the
robust mutex unmapped on mdb_env_close0():
munmap((void *)env->me_txns,
(env->me_maxreaders-1)*sizeof(MDB_reader)+sizeof(MDB_txninfo));
Note that if this is the last unmap for a robust mutex, the FreeBSD
implementation would garbage-collect the mutex, making it no longer visible to
other processes. As the result, a second instance of the attached test.c (from
https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=244493 with minor changes)
would trigger the assertion at mdb_txn_begin() because the acquisition of the
mutex would return 22 (EINVAL), because the mutex appeared to be a robust
mutex, but was invalid.
The attached lmdb.diff is a possible workaround for this (it would skip
unmapping when setting up the robust mutex for the first time).
--
You are receiving this mail because:
You are on the CC list for the issue.