Hello all.
We are using lmdb in our own storage service and recently found a write performance issue. The phenomenon is that lmdb batch write is very slow, and a write transaction operation takes several minutes. For example, if a transaction writes 100,000 kv, the average value size is 100 bytes, and it takes 5 minutes. The size of lmdb data file is 460G.
The analysis using perf is as follows:
53.14% liblgraph.so [.] mdb_page_alloc.isra.21 46.81% liblgraph.so [.] mdb_midl_xmerge 0.01% [kernel] [k] __check_object_size 0.01% [kernel] [k] __do_page_fault 0.01% [kernel] [k] __fput 0.01% [kernel] [k] get_futex_value_locked 0.01% [kernel] [k] radix_tree_descend 0.01% libpthread-2.17.so [.] __errno_location
The cpu are consumed on the two functions mdb_page_alloc and mdb_midl_xmerge.
By adding time statistics, I found that the blocking is in the mdb_freelist_save function in mdb_txn_commit. I'm not familiar with lmdb source code, can anyone explain why mdb_freelist_save consumes so much time? is this the expected result when lmdb data gets bigger? Is there any way to restore the write performance after the write becomes worse? What is the suggestion to improve the write performance of lmdb?