Dear Howard Chu,
Your responses about LMDB scaling were extremely helpful to us a couple of years ago. We are stuck with LMDB write scaling for our open-source project, until we can either eliminate this hockey-stick problem or at least get more clarity about the LMDB/mmap write performance.
Your guidance would be very much appreciated.
Problem:
Our application first writes data into LMDB. After the write cursor is closed, the LMDB is used with a read-only cursor. Our problem is that writing commit time hockey-sticks. Here is the graph on the machine with 128GB of RAM: https://i.imgur.com/py83dRV.jpeg (happens at 3200)
We used a virtual machine with 16 GB, to wait less for this hockey-stick: https://i.imgur.com/CYohM3O.jpeg (red curve is the commit time, happens at 3080; the lmdb env stats are on the bottom graph. Size on the top graph is total program size from /proc/[pid]/statm) .
We tried various combinations of environment flags: MDB_WRITEMAP, MDB_MAPASYNC, MDB_NOSYNC and MDB_NOMETASYNC. We also tried increasing mapsize. Nothing seems to matter (e.g. https://i.imgur.com/MkZJkxp.jpeg with 64G mapsize and MDB_NOSYNC), the best combination seems to be without any flags at all. Smaller mapsizes are worse, but increasing mapsize above 4GB in a 16GB RAM machine doesn't seem to gain anything.
Can you recommend a way to prevent or delay this hockey-stick when we are in the write regime and the reads are very minimal? We just want to write the data to disk fast, we don’t even need mmap() for read performance, we will need it in the read-only mode later, after the write cursor finishes and a read-only cursor is used.
Details:
The environment contains two databases, one with no flags (keys are 32 bytes, values are 8 byte integers), second one with MDB_INTEGERKEY (keys are 8 byte integers, values are 32 bytes, uses MDB_APPEND for writing). All commits are similar sized, 6000-8000 pages (page size is 4096). Each commit therefore is ~400k keys total for boths dbs, (7000*4096/((32+8)*2).
The VM has no disk swap, from perf output here https://i.imgur.com/zuTGawz.jpeg we can see that the machine is under page swap load. On a VM with 16GB RAM, hockey-stick happens when the LMDB size is only 1.5GB, 12% of the available RAM.
Detailed stats:
Below is the excerpt from the application log on each commit, the same data is plotted in the linked graphs above. Memory stats are from. /proc/[pid]/statm , lmdb stats are from the environment.“Difference” the time the commit takes, it degrades from 45 seconds to 3+ minutes. Final LMDB size in this log is pgs_in_use 448993*4096 = 1.84GB.
Start of log before write degradation:
2024-08-09 02:26:19 [INFO][src/importer.rs:198] File 3054: 132932384 bytes. Memory stats: Size: 27540910 pages, Resident: 286072 pages, Shared: 99484 pages, Text: 981 pages, Data: 191849 pages, lmdb: dict_env: branch_pgs 0 leaf_pgs 1 of_pgs0 last_pgno 108959 last_txnid 18 max_readers 126 num_readers 1 free_pgs 56171 pgs_in_use 52788
Difference: 0 min 44 sec
2024-08-09 02:27:03 [INFO][src/importer.rs:198] File 3055: 133951354 bytes. Memory stats: Size: 27541730 pages, Resident: 298463 pages, Shared: 111020 pages, Text: 981 pages, Data: 192669 pages, lmdb: dict_env: branch_pgs 0 leaf_pgs 1 of_pg
s 0 last_pgno 121928 last_txnid 19 max_readers 126 num_readers 1 free_pgs 62881 pgs_in_use 59047
Difference: 0 min 48 sec
2024-08-09 02:27:51 [INFO][src/importer.rs:198] File 3056: 133218473 bytes. Memory stats: Size: 27543735 pages, Resident: 313441 pages, Shared: 123980 pages, Text: 981 pages, Data: 194674 pages, lmdb: dict_env: branch_pgs 0 leaf_pgs 1 of_pg
s 0 last_pgno 133243 last_txnid 20 max_readers 126 num_readers 1 free_pgs 69672 pgs_in_use 63571
Difference: 0 min 40 sec
……
End of log after write degradation:2024-08-09 04:39:33 [INFO][src/importer.rs:198] File 3127: 132694310 bytes. Memory stats: Size: 27654397 pages, Resident: 1028693 pages, Shared: 727992 pages, Text: 981 pages, Data: 305336 pages, lmdb: dict_env: branch_pgs 0 leaf_pgs 1 of_pgs 0 last_pgno 745440 last_txnid 91 max_readers 126 num_readers 1 free_pgs 308699 pgs_in_use 436741
Difference: 2 min 36 sec
2024-08-09 04:42:09 [INFO][src/importer.rs:198] File 3128: 133418687 bytes. Memory stats: Size: 27654397 pages, Resident: 1033516 pages, Shared: 732815 pages, Text: 981 pages, Data: 305336 pages, lmdb: dict_env: branch_pgs 0 leaf_pgs 1 of_pgs 0 last_pgno 745440 last_txnid 92 max_readers 126 num_readers 1 free_pgs 304043 pgs_in_use 441397
Difference: 3 min 18 sec
2024-08-09 04:45:27 [INFO][src/importer.rs:198] File 3129: 133118782 bytes. Memory stats: Size: 27654397 pages, Resident: 1043876 pages, Shared: 743175 pages, Text: 981 pages, Data: 305336 pages, lmdb: dict_env: branch_pgs 0 leaf_pgs 1 of_pgs 0 last_pgno 762922 last_txnid 93 max_readers 126 num_readers 1 free_pgs 313929 pgs_in_use 448993