Hi,
I have a question about LMDB (I hope this is the right mailing list for
such a question).
I'm running a benchmark (which is similar to my intended use case) which
does not behave as I hoped. I store 1 billion key/value pairs in a
single LMDB database. _In a single transaction._ The keys are MD5 hash
codes from random data (16 bytes) and the value is the string "test".
I'm using lmdbjava which currently uses LMDB 0.9.19.
The benchmark is executed on Linux (Ubuntu 17.04 with a 4.10 kernel and
a ext4 filesystem).
In the beginning the data is inserted relatively fast:
1M/1000M took:3317 ms (3s 317ms)
Then the insert performance deteriorates gradually. After inserting 642M
entries inserting 1M entries takes more than 5 minutes:
642M/1000M took:305734 ms (5m 5s 734ms)
At this time the database size (data.mdb) is about 27GiB. The filesystem
buffer cache has about the same value, so I assume most pages are
cached. Linux still reports 28G free memory.
A short analysis with perf seems to indicate that most time is spent in
mdb_page_spill
Children Self
96,45% 0,00% lmdbjava-native-library.so [.] mdb_cursor_put
96,45% 0,00% lmdbjava-native-library.so [.] mdb_put
96,45% 0,00% jffi8421248145368054745.so (deleted) [.]
0xffff80428d388b3f
60,43% 2,61% lmdbjava-native-library.so [.]
mdb_page_spill.isra.16
47,39% 47,39% lmdbjava-native-library.so [.] mdb_midl_sort
26,07% 0,24% lmdbjava-native-library.so [.] mdb_page_touch
26,07% 0,00% lmdbjava-native-library.so [.] mdb_cursor_touch
25,83% 0,00% lmdbjava-native-library.so [.] mdb_page_unspill
23,22% 0,24% lmdbjava-native-library.so [.] mdb_page_dirty
22,99% 22,27% lmdbjava-native-library.so [.] mdb_mid2l_insert
11,14% 0,24% [kernel.kallsyms] [k]
entry_SYSCALL_64_fastpath
10,43% 0,47% lmdbjava-native-library.so [.] mdb_page_flush
9,95% 0,00% libpthread-2.24.so [.] __GI___libc_pwrite
9,72% 0,00% [kernel.kallsyms] [k] vfs_write
9,72% 0,00% [kernel.kallsyms] [k] sys_pwrite64
9,48% 0,00% [kernel.kallsyms] [k]
generic_perform_write
9,48% 0,00% [kernel.kallsyms] [k]
__generic_file_write_iter
9,48% 0,00% [kernel.kallsyms] [k] ext4_file_write_iter
9,48% 0,00% [kernel.kallsyms] [k] new_sync_write
9,48% 0,00% [kernel.kallsyms] [k] __vfs_write
9,24% 0,00% lmdbjava-native-library.so [.] mdb_cursor_set
8,06% 0,47% lmdbjava-native-library.so [.] mdb_page_search
7,35% 0,95% lmdbjava-native-library.so [.] mdb_page_search_root
4,98% 0,24% lmdbjava-native-library.so [.] mdb_page_get.isra.13
The documentation about mdb_page_spill says (as far as I understand)
that this function is called to prevent MDB_TXN_FULL situations. Does
this mean that my transaction is simply too large to be handled
efficiently by LMDB?
Note that a similar benchmark with 4 byte integer keys took only 2h34m
for 1000M entries (the integer keys were sorted, but I did not use
MDB_APPEND).
I understand LMDB is not write-optimized and maybe my transactions are
simply too large. However, I hope I'm just doing something wrong and I
can still use LMDB for my use case.
Any ideas?
Thank you,
Juergen
--
Juergen Baier