https://bugs.openldap.org/show_bug.cgi?id=9619
Issue ID: 9619
Summary: mdb_env_copy2 with MDB_CP_COMPACT in mdb.master3
produces corrupt mdb file
Product: LMDB
Version: 0.9.29
Hardware: All
OS: Windows
Status: UNCONFIRMED
Severity: normal
Priority: ---
Component: liblmdb
Assignee: bugs(a)openldap.org
Reporter: kriszyp(a)gmail.com
Target Milestone: ---
When copying an LMDB database with mdb_env_copy2 with the MDB_CP_COMPACT with
mdb.master3, the resulting mdb file seems to be corrupt and when using it in
LMDB, I get segmentation faults. Copying without the compacting flag seems to
work fine. I apologize, I know this is not a very good issue report, as I
haven't had a chance to actually narrow this down to a more
reproducible/isolated case, or look for how to patch. I thought I would report
in case there are any ideas on what could cause this. The segmentation faults
always seem to be memory write faults (as opposed to try fault on trying to
read). Or perhaps the current backup/copying functionality is eventually going
to be replaced by incremental backup/copying anyway
(https://twitter.com/hyc_symas/status/1315651814096875520). I'll try to update
this if I get a chance to investigate more, but otherwise feel free to
ignore/consider low-priority since the work around is easy.
--
You are receiving this mail because:
You are on the CC list for the issue.
https://bugs.openldap.org/show_bug.cgi?id=10108
Issue ID: 10108
Summary: "mdb_dump -a" does not dump the main database
Product: LMDB
Version: 0.9.29
Hardware: All
OS: All
Status: UNCONFIRMED
Keywords: needs_review
Severity: normal
Priority: ---
Component: liblmdb
Assignee: bugs(a)openldap.org
Reporter: tuukka.pensala(a)gmail.com
Target Milestone: ---
In mdb_dump.c we have these instructions:
/* -a: dump main DB and all subDBs
* -s: dump only the named subDB
* -n: use NOSUBDIR flag on env_open
* -p: use printable characters
* -f: write to file instead of stdout
* -V: print version and exit
* (default) dump only the main DB
*/
However, contrary to the description, the option -a does not dump the main DB.
With argument -a "dumpit(..)" is called for the named databases, but not for
the unnamed one.
With the current behavior, if the data store contains subDBs and has user-added
data in the main DB, there seems to be no way to dump all of it at once using
mdb_dump.
--
You are receiving this mail because:
You are on the CC list for the issue.
https://bugs.openldap.org/show_bug.cgi?id=9434
Issue ID: 9434
Summary: Abysmal write performance with certain data patterns
Product: LMDB
Version: 0.9.24
Hardware: x86_64
OS: Linux
Status: UNCONFIRMED
Severity: normal
Priority: ---
Component: liblmdb
Assignee: bugs(a)openldap.org
Reporter: tina(a)tina.pm
Target Milestone: ---
Created attachment 784
--> https://bugs.openldap.org/attachment.cgi?id=784&action=edit
Monitoring graph of disk usage
Hi,
I have recently written a project for a customer which relies heavily on LMDB,
in which performance is critical. Sadly, after completing the project I started
having all kinds of problems when the DB started to grow. This has gotten so
bad the project release had to be postponed, and I have been asked to rewrite
the DB layer using a different engine, unless I can find some solution quickly.
I have so far found 4 serious issues, which I suspect are related either to the
size of the database or to the patterns of the data:
* Writing a value in some of the subdatabases has become increasingly slower,
and commits are taking way too long to complete. This is running on a powerful
computer with SSDs, and the 95% percentile of commits is at around 400ms. The
single-writer limitation meant that I have run out of optimisations to try.
* For some reason I cannot understand, the disk usage has grown to over 2x the
size of the actual data stored, and the free space does not seem to be
reclaimed. The file takes up 348 GB, while the used pages amount to only 162
GB.
* A couple of days ago it had a sudden spike in disk usage (not correlated to
increases in actual data stored, or even to the last pageno user) that filled
the disk in a couple of hours. You can see this in the attached captures of the
monitoring graphs which show actual disk usage (bottom) and counts of pages as
reported by LMDB (top). The bottom graph is total disk usage, although the
partition is almost exclusively the database, but ignore the few dips in size
which are from removing other stuff.
* Running `mdb_dump` for backups takes up to 7 hours for the database; restores
are totally useless: I tried to re-create the database after the weird space
spike and had to stopped after 24h when not even 30% of the data ad been
restored! This alone is a deal-breaker, as we have no usable way to backup and
restore the database.
For context, this is the mdb_stat output with descriptions of each subdatabase.
I have no explanation for the ridiculous amount of free pages, and even running
mdb_stat takes a few seconds:
Environment Info
Map address: (nil)
Map size: 397166026752
Page size: 4096
Max pages: 96964362
Number of pages used: 90991042
Last transaction ID: 14647267
Max readers: 126
Number of readers used: 4
Freelist Status
Tree depth: 3
Branch pages: 26
Leaf pages: 5168
Overflow pages: 74319
Entries: 111981
Free pages: 36352392
Status of Main DB
Tree depth: 1
Branch pages: 0
Leaf pages: 1
Overflow pages: 0
Entries: 8
Status of audit_log
Tree depth: 4
Branch pages: 309
Leaf pages: 69154
Overflow pages: 6082343
Entries: 2061655
* Audit log: MDB_INTEGERKEY, big values (12kb av). Append only, few reads.
Status of audit_idx
Tree depth: 4
Branch pages: 261
Leaf pages: 27310
Overflow pages: 0
Entries: 2006963
* Audit index 1: 40 byte keys, 8 byte values. Append only, it has less records
as I disabled it yesterday due to its impact on performance.
Status of time_idx
Tree depth: 3
Branch pages: 22
Leaf pages: 4611
Overflow pages: 0
Entries: 2061655
* Audit index 2: MDB_INTEGERKEY, MDB_DUPSORT, MDB_DUPFIXED; 40 byte values.
Append only.
Status of item_db
Tree depth: 4
Branch pages: 132
Leaf pages: 10040
Overflow pages: 0
Entries: 186291
* Main data store: 40 byte keys, small values (220b avg). Lots of reads and new
records, very few deletes and no updates.
Status of user_state_db
Tree depth: 5
Branch pages: 83283
Leaf pages: 9289578
Overflow pages: 32
Entries: 207894432
* User state: 20-40 byte keys, small values (180b avg), *many* entries. Lots
and reads and updates.
Status of item_users_idx
Tree depth: 4
Branch pages: 203
Leaf pages: 16532
Overflow pages: 0
Entries: 1035586217
* User / data matrix index: MDB_DUPSORT; 40 byte keys, 20-40 byte values,
*really big*. Lots of writes, very few deletes and no updates.
Status of user_log
Tree depth: 5
Branch pages: 361275
Leaf pages: 26570347
Overflow pages: 0
Entries: 1035586217
* User log: 30-50 byte keys, small values (100b avg), 1e9 records. Append only,
very few reads. I had to stop the restore operation while this was being
recreated, because after 24h only 50% the entries had been restored. Thanks to
monitoring, I measured this maxing out at 7000 entries per second; the other
databases showed way slower rates than this!
Any help would be really appreciated!
Thanks. Tina.
--
You are receiving this mail because:
You are on the CC list for the issue.
https://bugs.openldap.org/show_bug.cgi?id=9316
Issue ID: 9316
Summary: performance issue when writing a high number of large
objects
Product: LMDB
Version: 0.9.24
Hardware: x86_64
OS: Linux
Status: UNCONFIRMED
Severity: normal
Priority: ---
Component: liblmdb
Assignee: bugs(a)openldap.org
Reporter: JGabler(a)univa.com
Target Milestone: ---
Created attachment 755
--> https://bugs.openldap.org/attachment.cgi?id=755&action=edit
lmdb performance test reproducing the issue
When writing a high number of big objects we see an extreme variation in
performance from very fast to extremely slow.
In the test scenario we write 10 chunks of 10.000 "jobs" (some 10kB) and their
corresponding "job script" (some 40kB), 200.000 objects in total.
Then delete all objects.
We do 10 iterations of this scenario.
When running this scenario as part of Univa Grid Engine with LMDB as database
backend we get the following performance values (rows are the iteration,
columns the chunk of jobs):
Iteration 0 1 2 3 4 5 6 7 8 9
0 21.525 21.250 21.574 21.722 22.693 21.992 22.438 22.650
21.972 22.017
1 22.262 21.656 22.339 22.914 21.549 24.906 23.862 1531.189
1695.041 1491.255
2 36.071 21.619 22.074 22.927 23.455 27.239 22.640 22.802
633.956 1882.008
3 52.163 21.651 21.571 22.686 22.727 22.024 40.980 22.156
22.429 595.362
4 64.977 21.511 22.519 22.148 22.354 23.292 57.740 20.835
37.680 250.594
5 54.724 21.074 21.200 23.744 22.109 21.351 62.225 21.447
91.292 375.260
6 49.065 21.573 22.309 26.084 21.226 21.248 68.580 22.531
59.338 249.936
7 44.666 21.830 21.009 28.760 21.533 21.611 72.291 23.144
86.281 118.326
8 35.486 21.720 21.840 24.729 22.045 20.877 76.473 21.193
120.387 136.836
9 41.159 23.365 21.721 23.024 21.835 20.972 77.409 21.784
193.885 306.158
So usually writing of 10.000 "jobs"+"job_script" takes some 22 seconds but
after some time performance massively breaks in.
With other database backends we do not see this behaviour, see the following
performance data of the same test done with PostgreSQL backend which is slower
(as expected going over the network) but provides constant throughput:
Iteration 0 1 2 3 4 5 6 7 8 9
0 36.937 37.110 36.952 37.279 37.580 37.364 37.950 37.390
37.682 37.439
1 37.464 38.110 37.679 38.366 37.576 37.624 37.476 37.412
37.265 37.727
2 36.394 37.635 37.347 37.603 37.402 37.515 37.802 37.898
37.355 37.939
3 37.213 37.539 36.771 37.706 37.055 37.780 37.283 37.488
36.955 37.460
4 36.554 37.557 37.368 37.960 37.070 37.892 37.459 37.857
37.228 37.833
5 37.047 38.164 37.167 37.885 37.268 37.676 37.355 37.572
37.347 37.569
6 37.118 37.735 36.857 37.602 36.717 37.716 37.444 37.685
37.085 38.151
7 36.787 37.647 36.844 37.601 36.934 37.440 37.632 37.291
37.174 37.926
8 36.884 37.560 37.117 37.239 37.034 37.748 37.289 37.635
36.822 37.693
9 37.178 37.496 36.849 37.799 37.289 37.644 37.461 37.622
37.022 37.670
We can reproduce the issue with a small C program (see attachment) which does
essentially the same database operations as our database layer in Univa Grid
Engine but depends only on liblmdb.
It simulates the scenario described above and gives us the following
performance data
showing the extreme performance variation:
Iteration 0 1 2 3 4 5 6 7 8 9
0 0.686 0.625 0.660 0.637 0.631 0.741 0.757 0.658
0.651 0.614
1 0.705 0.838 0.690 0.772 0.663 3.248 0.605 542.762
1114.374 898.477
2 13.336 1.299 0.659 0.637 0.626 0.712 11.172 0.663
29.833 1161.884
3 26.774 0.647 0.607 0.586 0.583 0.639 24.893 0.629
3.837 423.248
4 32.802 0.629 0.616 0.560 0.550 0.605 31.133 0.625
6.606 195.150
5 34.819 0.623 0.628 0.582 0.564 0.609 32.275 0.607
7.599 134.106
6 26.319 0.622 0.582 0.548 0.551 0.590 28.536 0.611
36.429 160.781
7 21.878 0.814 0.668 0.736 0.614 0.543 24.355 0.626
36.583 148.337
8 4.129 0.654 5.674 0.596 0.566 0.554 7.158 0.633
0.599 48.799
9 30.278 0.608 0.608 0.560 0.549 0.587 29.253 0.606
9.593 128.339
It can be compiled on Linux 64bit with
gcc -I <path to lmdb>/include -L <path to lmdb>/lib -o test_lmdb_perf
test_lmdb_perf.c -llmdb
To run the given scenario call it with the following parameters:
./test_lmdb_perf <path to database directory> 10 10 10000
We built and ran it on
- CentOS Linux release 7.7.1908 (Core)
- Linux biber 3.10.0-1062.9.1.el7.x86_64 #1 SMP Fri Dec 6 15:49:49 UTC 2019
x86_64 x86_64 x86_64 GNU/Linux
- it was built with gcc (GCC) 7.2.1 20170829 (Red Hat 7.2.1-1) from
devtoolset-7
--
You are receiving this mail because:
You are on the CC list for the issue.
https://bugs.openldap.org/show_bug.cgi?id=9278
Issue ID: 9278
Summary: liblmdb: robust mutexes should not be unmapped
Product: LMDB
Version: unspecified
Hardware: All
OS: FreeBSD
Status: UNCONFIRMED
Severity: normal
Priority: ---
Component: liblmdb
Assignee: bugs(a)openldap.org
Reporter: delphij(a)freebsd.org
Target Milestone: ---
Created attachment 736
--> https://bugs.openldap.org/attachment.cgi?id=736&action=edit
A possible workaround
We recently noticed that lmdb would have the memory region containing the
robust mutex unmapped on mdb_env_close0():
munmap((void *)env->me_txns,
(env->me_maxreaders-1)*sizeof(MDB_reader)+sizeof(MDB_txninfo));
Note that if this is the last unmap for a robust mutex, the FreeBSD
implementation would garbage-collect the mutex, making it no longer visible to
other processes. As the result, a second instance of the attached test.c (from
https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=244493 with minor changes)
would trigger the assertion at mdb_txn_begin() because the acquisition of the
mutex would return 22 (EINVAL), because the mutex appeared to be a robust
mutex, but was invalid.
The attached lmdb.diff is a possible workaround for this (it would skip
unmapping when setting up the robust mutex for the first time).
--
You are receiving this mail because:
You are on the CC list for the issue.
https://bugs.openldap.org/show_bug.cgi?id=9528
Issue ID: 9528
Summary: liblmdb throws away environment CFLAGS
Product: LMDB
Version: 0.9.29
Hardware: All
OS: All
Status: UNCONFIRMED
Severity: normal
Priority: ---
Component: liblmdb
Assignee: bugs(a)openldap.org
Reporter: quanah(a)openldap.org
Target Milestone: ---
When building LMDB, specified hardening flags are thrown away due to the
Makefile overriding CFLAGS:
Makefile:CFLAGS = $(THREADS) $(OPT) $(W) $(XCFLAGS)
--
You are receiving this mail because:
You are on the CC list for the issue.
https://bugs.openldap.org/show_bug.cgi?id=9378
Issue ID: 9378
Summary: Crash in mdb_put() / mdb_page_dirty()
Product: LMDB
Version: 0.9.26
Hardware: All
OS: Linux
Status: UNCONFIRMED
Severity: normal
Priority: ---
Component: liblmdb
Assignee: bugs(a)openldap.org
Reporter: nate(a)kde.org
Target Milestone: ---
The KDE Baloo file indexer uses lmdb as its database (source code available at
https://invent.kde.org/frameworks/baloo). Our most common crash, with over 100
duplicate bug reports, is in lmdb. Here's the bug report tracking it:
https://bugs.kde.org/show_bug.cgi?id=389848.
The version of lmdb does not seem to matter much. We have bug reports from Arch
users with lmdb 0.9.26 as well as bug reports from people using many earlier
versions.
Here's an example backtrace, taken from
https://bugs.kde.org/show_bug.cgi?id=426195:
#6 __GI_raise (sig=sig@entry=6) at ../sysdeps/unix/sysv/linux/raise.c:50
#7 0x00007f3c0bbb9859 in __GI_abort () at abort.c:79
#8 0x00007f3c0b23ba83 in mdb_assert_fail (env=0x55e2ad710600,
expr_txt=expr_txt@entry=0x7f3c0b23e02f "rc == 0",
func=func@entry=0x7f3c0b23e978 <__func__.7221> "mdb_page_dirty",
line=line@entry=2127, file=0x7f3c0b23e010 "mdb.c") at mdb.c:1542
#9 0x00007f3c0b2306d5 in mdb_page_dirty (mp=<optimized out>,
txn=0x55e2ad7109f0) at mdb.c:2114
#10 mdb_page_dirty (txn=0x55e2ad7109f0, mp=<optimized out>) at mdb.c:2114
#11 0x00007f3c0b231966 in mdb_page_alloc (num=num@entry=1,
mp=mp@entry=0x7f3c0727aee8, mc=<optimized out>) at mdb.c:2308
#12 0x00007f3c0b231ba3 in mdb_page_touch (mc=mc@entry=0x7f3c0727b420) at
mdb.c:2495
#13 0x00007f3c0b2337c7 in mdb_cursor_touch (mc=mc@entry=0x7f3c0727b420) at
mdb.c:6523
#14 0x00007f3c0b2368f9 in mdb_cursor_put (mc=mc@entry=0x7f3c0727b420,
key=key@entry=0x7f3c0727b810, data=data@entry=0x7f3c0727b820,
flags=flags@entry=0) at mdb.c:6657
#15 0x00007f3c0b23976b in mdb_put (txn=0x55e2ad7109f0, dbi=5,
key=key@entry=0x7f3c0727b810, data=data@entry=0x7f3c0727b820,
flags=flags@entry=0) at mdb.c:9022
#16 0x00007f3c0c7124c5 in Baloo::DocumentDB::put
(this=this@entry=0x7f3c0727b960, docId=<optimized out>,
docId@entry=27041423333263366, list=...) at ./src/engine/documentdb.cpp:79
#17 0x00007f3c0c743da7 in Baloo::WriteTransaction::replaceDocument
(this=0x55e2ad7ea340, doc=..., operations=operations@entry=...) at
./src/engine/writetransaction.cpp:232
#18 0x00007f3c0c736b16 in Baloo::Transaction::replaceDocument
(this=this@entry=0x7f3c0727bc10, doc=..., operations=operations@entry=...) at
./src/engine/transaction.cpp:295
#19 0x000055e2ac5d6cbc in Baloo::UnindexedFileIndexer::run
(this=0x55e2ad79ca20) at
/usr/include/x86_64-linux-gnu/qt5/QtCore/qrefcount.h:60
#20 0x00007f3c0c177f82 in QThreadPoolThread::run (this=0x55e2ad717f20) at
thread/qthreadpool.cpp:99
#21 0x00007f3c0c1749d2 in QThreadPrivate::start (arg=0x55e2ad717f20) at
thread/qthread_unix.cpp:361
#22 0x00007f3c0b29d609 in start_thread (arg=<optimized out>) at
pthread_create.c:477
#23 0x00007f3c0bcb6103 in clone () at
../sysdeps/unix/sysv/linux/x86_64/clone.S:95
--
You are receiving this mail because:
You are on the CC list for the issue.
https://bugs.openldap.org/show_bug.cgi?id=9736
Issue ID: 9736
Summary: pwrite bug in OSX breaking LMDB promise about the
maximum value size
Product: LMDB
Version: unspecified
Hardware: All
OS: Mac OS
Status: UNCONFIRMED
Keywords: needs_review
Severity: normal
Priority: ---
Component: liblmdb
Assignee: bugs(a)openldap.org
Reporter: renault.cle(a)gmail.com
Target Milestone: ---
Hi,
I was working with LMDB and found an issue when trying to write a value of
approximately 3.3GiB in the database, I dive into the LMDB source code of the
mdb_put method using the lldb debugger and found out that it was not related to
an issue in LMDB itself but rather a bug in the pwrite function of the Mac OS
libc implementation.
The pwrite function is given four parameters, the file descriptor, the buffer,
the count of bytes to write from the buffer and, the offset of where to write
it in the file. On Mac OS the count of bytes is a size_t that must be a 64bits
unsigned integer but when you call pwrite with a number bigger or equal to 2^31
it returns an error 22 (invalid argument). LMDB was returning a 22 error from
the mdb_put call and not an EINVAL because the error was cause by an internal
issue and not something catchable by LMDB.
I am not sure about what we can do, can we implement this single pwrite [1] as
multiple pwrite with counts smaller than 2^31 in a loop, just for Mac OS? Like
for Windows where we do specific things for this operating system too?
I also found this issue on the RocksDB repository [2] about a similar problem
they have with pwrite and write on Mac OS it seems. I understand that this is
not a real promise that LMDB is specifying but rather an "in theory" rule [3].
Thank you for your time,
kero
[1]:
https://github.com/LMDB/lmdb/blob/01b1b7dc204abdf3849536979205dc9e3a0e3ece/…
[2]: https://github.com/facebook/rocksdb/issues/5169
[3]: http://www.lmdb.tech/doc/group__mdb.html#structMDB__val
--
You are receiving this mail because:
You are on the CC list for the issue.
https://bugs.openldap.org/show_bug.cgi?id=10054
Issue ID: 10054
Summary: Value size limited to 2,147,479,552 bytes
Product: LMDB
Version: unspecified
Hardware: x86_64
OS: Linux
Status: UNCONFIRMED
Keywords: needs_review
Severity: normal
Priority: ---
Component: liblmdb
Assignee: bugs(a)openldap.org
Reporter: louis(a)meilisearch.com
Target Milestone: ---
Hello,
According to the documentation[0], a database that is not using `MDB_DUPSORT`
can store values up to `0xffffffff` bytes (around 4GB).
In practice, under Linux, the actual limit is `0x7ffff000` though (2^31 - 4096,
so around 2GB).
This is due to the write loop in `mdb_page_flush`. The `wsize` value
determining how many bytes will be written can be as big as
`4096*dp->mp_pages`[1], and the number of overflow pages grows with the size of
the value put inside the DB.
The `wsize` is not split in smaller chunks in the case where there are many
overflow pages to write, and as a result the call to `pwrite`[2] does not
perform a full write, but only a "short" write of 2147479552 bytes (the maximum
allowed on a call to `pwrite` on Linux[3]).
This would be OK if the short write condition was handled by looping and
performing another `pwrite` with the rest of the data, but instead `EIO` is
returned[4].
There seems to be a related, but different issue on macOS when trying to
`pwrite` more the 2^31 bytes, that was already reported[5].
This issue was reported to me by a Meilisearch user because it causes their
database indexing to fail[6]. I had to investigate a bit because their setup
was peculiar (high number of documents in their database) and the `EIO` error
code is not very descriptive of the underlying issue.
I join a C reproducer of the issue that attempts to add a 2147479553 bytes
value to the DB and fails with `EIO` (decreasing `nb_items` to a smaller value
such as `2107479552` does succeed)[7].
Thank you for making LMDB!
Louis Dureuil.
[0]:
https://github.com/LMDB/lmdb/blob/mdb.master/libraries/liblmdb/lmdb.h#LL284…
[1]:
https://github.com/LMDB/lmdb/blob/mdb.master/libraries/liblmdb/mdb.c#LL3770…
[2]: https://github.com/LMDB/lmdb/blob/mdb.master/libraries/liblmdb/mdb.c#L3820
[3]:
https://stackoverflow.com/questions/70368651/why-cant-linux-write-more-than…
[4]: https://github.com/LMDB/lmdb/blob/mdb.master/libraries/liblmdb/mdb.c#L3840
[5]: https://bugs.openldap.org/show_bug.cgi?id=9736
[6]: https://github.com/meilisearch/meilisearch/issues/3654
[7]: https://github.com/dureuill/lmdb_3654/tree/main
--
You are receiving this mail because:
You are on the CC list for the issue.