Problem:
Our application first writes data into LMDB. After the data is written, the LMDB is used in the read-only mode. Our problem is that writing commit time hockey-sticks. Here is the graph on the machine with 128GB of RAM: https://i.imgur.com/py83dRV.jpeg (happens at 3200)
We used a virtual machine with 16 GB, to wait less for this hockey-stick: https://i.imgur.com/CYohM3O.jpeg (red curve is the commit time, happens at 3080; the lmdb env stats are on the bottom graph. Size on the top graph is total program size from /proc/[pid]/statm) .
We tried various combinations of environment flags: MDB_WRITEMAP, MDB_MAPASYNC, MDB_NOSYNC and MDB_NOMETASYNC. We also tried increasing mapsize. Nothing seems to matter (e.g. https://i.imgur.com/MkZJkxp.jpeg with 64G mapsize and MDB_NOSYNC), the best combination seems to be without any flags at all. Smaller mapsizes are worse, but increasing mapsize above 4GB in a 16GB RAM machine doesn't seem to gain anything.
Can you recommend a way to prevent or delay this hockey-stick when we are in the write-only, append-only regime? We just want to write the data to disk fast, we don’t even need mmap() here, we will need it in the read-only mode later.
Details:
The environment contains two databases, one with no flags (keys are 32 bytes, values are 8 byte integers), second one with MDB_INTEGERKEY (keys are 8 byte integers, values are 32 bytes, uses MDB_APPEND for writing). All commits are similar sized, 6000-8000 pages (page size is 4096). Each commit therefore is ~400k keys total for boths dbs, (7000*4096/((32+8)*2). The VM has no disk swap, from perf output here https://i.imgur.com/zuTGawz.jpeg we can see that the machine is under page swap load. On a VM with 16GB RAM, hockey-stick happens when the LMDB size is only 1.5GB, 12% of the available RAM.
Detailed stats:
Below is the excerpt from the application log on each commit, the same data is plotted in the linked graphs above. Memory stats are from. /proc/[pid]/statm , lmdb stats are from the environment.“Difference” the time the commit takes, it degrades from 45 seconds to 3+ minutes. Final LMDB size in this log is pgs_in_use 448993*4096 = 1.84GB.
Start of log before write degradation: 2024-08-09 02:26:19 [INFO][src/importer.rs:198] File 3054: 132932384 bytes. Memory stats: Size: 27540910 pages, Resident: 286072 pages, Shared: 99484 pages, Text: 981 pages, Data: 191849 pages, lmdb: dict_env: branch_pgs 0 leaf_pgs 1 of_pgs
0 last_pgno 108959 last_txnid 18 max_readers 126 num_readers 1 free_pgs 56171 pgs_in_use 52788
Difference: 0 min 44 sec
2024-08-09 02:27:03 [INFO][src/importer.rs:198] File 3055: 133951354 bytes. Memory stats: Size: 27541730 pages, Resident: 298463 pages, Shared: 111020 pages, Text: 981 pages, Data: 192669 pages, lmdb: dict_env: branch_pgs 0 leaf_pgs 1 of_pg
s 0 last_pgno 121928 last_txnid 19 max_readers 126 num_readers 1 free_pgs 62881 pgs_in_use 59047
Difference: 0 min 48 sec
2024-08-09 02:27:51 [INFO][src/importer.rs:198] File 3056: 133218473 bytes. Memory stats: Size: 27543735 pages, Resident: 313441 pages, Shared: 123980 pages, Text: 981 pages, Data: 194674 pages, lmdb: dict_env: branch_pgs 0 leaf_pgs 1 of_pg
s 0 last_pgno 133243 last_txnid 20 max_readers 126 num_readers 1 free_pgs 69672 pgs_in_use 63571
Difference: 0 min 40 sec
…… End of log after write degradation:
2024-08-09 04:39:33 [INFO][src/importer.rs:198] File 3127: 132694310 bytes. Memory stats: Size: 27654397 pages, Resident: 1028693 pages, Shared: 727992 pages, Text: 981 pages, Data: 305336 pages, lmdb: dict_env: branch_pgs 0 leaf_pgs 1 of_pgs 0 last_pgno 745440 last_txnid 91 max_readers 126 num_readers 1 free_pgs 308699 pgs_in_use 436741
Difference: 2 min 36 sec
2024-08-09 04:42:09 [INFO][src/importer.rs:198] File 3128: 133418687 bytes. Memory stats: Size: 27654397 pages, Resident: 1033516 pages, Shared: 732815 pages, Text: 981 pages, Data: 305336 pages, lmdb: dict_env: branch_pgs 0 leaf_pgs 1 of_pgs 0 last_pgno 745440 last_txnid 92 max_readers 126 num_readers 1 free_pgs 304043 pgs_in_use 441397
Difference: 3 min 18 sec
2024-08-09 04:45:27 [INFO][src/importer.rs:198] File 3129: 133118782 bytes. Memory stats: Size: 27654397 pages, Resident: 1043876 pages, Shared: 743175 pages, Text: 981 pages, Data: 305336 pages, lmdb: dict_env: branch_pgs 0 leaf_pgs 1 of_pgs 0 last_pgno 762922 last_txnid 93 max_readers 126 num_readers 1 free_pgs 313929 pgs_in_use 448993
openldap-technical@openldap.org