https://bugs.openldap.org/show_bug.cgi?id=9434
Issue ID: 9434 Summary: Abysmal write performance with certain data patterns Product: LMDB Version: 0.9.24 Hardware: x86_64 OS: Linux Status: UNCONFIRMED Severity: normal Priority: --- Component: liblmdb Assignee: bugs@openldap.org Reporter: tina@tina.pm Target Milestone: ---
Created attachment 784 --> https://bugs.openldap.org/attachment.cgi?id=784&action=edit Monitoring graph of disk usage
Hi,
I have recently written a project for a customer which relies heavily on LMDB, in which performance is critical. Sadly, after completing the project I started having all kinds of problems when the DB started to grow. This has gotten so bad the project release had to be postponed, and I have been asked to rewrite the DB layer using a different engine, unless I can find some solution quickly.
I have so far found 4 serious issues, which I suspect are related either to the size of the database or to the patterns of the data:
* Writing a value in some of the subdatabases has become increasingly slower, and commits are taking way too long to complete. This is running on a powerful computer with SSDs, and the 95% percentile of commits is at around 400ms. The single-writer limitation meant that I have run out of optimisations to try.
* For some reason I cannot understand, the disk usage has grown to over 2x the size of the actual data stored, and the free space does not seem to be reclaimed. The file takes up 348 GB, while the used pages amount to only 162 GB.
* A couple of days ago it had a sudden spike in disk usage (not correlated to increases in actual data stored, or even to the last pageno user) that filled the disk in a couple of hours. You can see this in the attached captures of the monitoring graphs which show actual disk usage (bottom) and counts of pages as reported by LMDB (top). The bottom graph is total disk usage, although the partition is almost exclusively the database, but ignore the few dips in size which are from removing other stuff.
* Running `mdb_dump` for backups takes up to 7 hours for the database; restores are totally useless: I tried to re-create the database after the weird space spike and had to stopped after 24h when not even 30% of the data ad been restored! This alone is a deal-breaker, as we have no usable way to backup and restore the database.
For context, this is the mdb_stat output with descriptions of each subdatabase. I have no explanation for the ridiculous amount of free pages, and even running mdb_stat takes a few seconds:
Environment Info Map address: (nil) Map size: 397166026752 Page size: 4096 Max pages: 96964362 Number of pages used: 90991042 Last transaction ID: 14647267 Max readers: 126 Number of readers used: 4 Freelist Status Tree depth: 3 Branch pages: 26 Leaf pages: 5168 Overflow pages: 74319 Entries: 111981 Free pages: 36352392 Status of Main DB Tree depth: 1 Branch pages: 0 Leaf pages: 1 Overflow pages: 0 Entries: 8
Status of audit_log Tree depth: 4 Branch pages: 309 Leaf pages: 69154 Overflow pages: 6082343 Entries: 2061655
* Audit log: MDB_INTEGERKEY, big values (12kb av). Append only, few reads.
Status of audit_idx Tree depth: 4 Branch pages: 261 Leaf pages: 27310 Overflow pages: 0 Entries: 2006963
* Audit index 1: 40 byte keys, 8 byte values. Append only, it has less records as I disabled it yesterday due to its impact on performance.
Status of time_idx Tree depth: 3 Branch pages: 22 Leaf pages: 4611 Overflow pages: 0 Entries: 2061655
* Audit index 2: MDB_INTEGERKEY, MDB_DUPSORT, MDB_DUPFIXED; 40 byte values. Append only.
Status of item_db Tree depth: 4 Branch pages: 132 Leaf pages: 10040 Overflow pages: 0 Entries: 186291
* Main data store: 40 byte keys, small values (220b avg). Lots of reads and new records, very few deletes and no updates.
Status of user_state_db Tree depth: 5 Branch pages: 83283 Leaf pages: 9289578 Overflow pages: 32 Entries: 207894432
* User state: 20-40 byte keys, small values (180b avg), *many* entries. Lots and reads and updates.
Status of item_users_idx Tree depth: 4 Branch pages: 203 Leaf pages: 16532 Overflow pages: 0 Entries: 1035586217
* User / data matrix index: MDB_DUPSORT; 40 byte keys, 20-40 byte values, *really big*. Lots of writes, very few deletes and no updates.
Status of user_log Tree depth: 5 Branch pages: 361275 Leaf pages: 26570347 Overflow pages: 0 Entries: 1035586217
* User log: 30-50 byte keys, small values (100b avg), 1e9 records. Append only, very few reads. I had to stop the restore operation while this was being recreated, because after 24h only 50% the entries had been restored. Thanks to monitoring, I measured this maxing out at 7000 entries per second; the other databases showed way slower rates than this!
Any help would be really appreciated!
Thanks. Tina.