Hello all.
We are using lmdb in our own storage service and recently found a write performance issue. The phenomenon is that lmdb batch write is very slow, and a write transaction operation takes several minutes. For example, if a transaction writes 100,000 kv, the average value size is 100 bytes, and it takes 5 minutes. The size of lmdb data file is 460G.
The analysis using perf is as follows:
53.14% liblgraph.so [.] mdb_page_alloc.isra.21 46.81% liblgraph.so [.] mdb_midl_xmerge 0.01% [kernel] [k] __check_object_size 0.01% [kernel] [k] __do_page_fault 0.01% [kernel] [k] __fput 0.01% [kernel] [k] get_futex_value_locked 0.01% [kernel] [k] radix_tree_descend 0.01% libpthread-2.17.so [.] __errno_location
The cpu are consumed on the two functions mdb_page_alloc and mdb_midl_xmerge.
By adding time statistics, I found that the blocking is in the mdb_freelist_save function in mdb_txn_commit. I'm not familiar with lmdb source code, can anyone explain why mdb_freelist_save consumes so much time? is this the expected result when lmdb data gets bigger? Is there any way to restore the write performance after the write becomes worse? What is the suggestion to improve the write performance of lmdb?
machine configuration: 16-core cpu 128G RAM 3.5T NVME SSD
no ssd io read/write bottleneck reached
Wang Zhiyong wrote:
Hello all.
We are using lmdb in our own storage service and recently found a write performance issue. The phenomenon is that lmdb batch write is very slow, and a write transaction operation takes several minutes. For example, if a transaction writes 100,000 kv, the average value size is 100 bytes, and it takes 5 minutes.
Sounds like you should use smaller batches.
The size of lmdb data file is 460G.
The analysis using perf is as follows:
53.14% liblgraph.so [.] mdb_page_alloc.isra.21 46.81% liblgraph.so [.] mdb_midl_xmerge 0.01% [kernel] [k] __check_object_size 0.01% [kernel] [k] __do_page_fault 0.01% [kernel] [k] __fput 0.01% [kernel] [k] get_futex_value_locked 0.01% [kernel] [k] radix_tree_descend 0.01% libpthread-2.17.so [.] __errno_location
The cpu are consumed on the two functions mdb_page_alloc and mdb_midl_xmerge.
By adding time statistics, I found that the blocking is in the mdb_freelist_save function in mdb_txn_commit. I'm not familiar with lmdb source code, can anyone explain why mdb_freelist_save consumes so much time? is this the expected result when lmdb data gets bigger? Is there any way to restore the write performance after the write becomes worse? What is the suggestion to improve the write performance of lmdb?
I found the time blocking is in the `mdb_cursor_put` operation on line 3570 of the file https://github.com/LMDB/lmdb/blob/mdb.master/libraries/liblmdb/mdb.c
When the size of data is several hundred KB,`mdb_cursor_put` may block for several tens of seconds.
Just put one key,It takes so long.
From our experience such issue with write performance was usually caused by a fragmented freelist. In this case `mdb_page_alloc` will search for large consecutive piece of free space (indicated by the `num` argument) on a large freelist which does however only contain small consecutive blocks of free space. You can check the status of the freelist using `mdb_stat -ff`.
regards,
Steffen Michels
On 24/04/2023 14:37, Wang Zhiyong wrote:
I found the time blocking is in the `mdb_cursor_put` operation on line 3570 of the filehttps://github.com/LMDB/lmdb/blob/mdb.master/libraries/liblmdb/mdb.c
When the size of data is several hundred KB,`mdb_cursor_put` may block for several tens of seconds.
Just put one key,It takes so long.
Thanks for your advice. There are indeed too many free pages. We switched to use a small write batch and solved the problem we encountered.
openldap-technical@openldap.org