Hi,
I have a question about LMDB (I hope this is the right mailing list for such a question).
I'm running a benchmark (which is similar to my intended use case) which does not behave as I hoped. I store 1 billion key/value pairs in a single LMDB database. _In a single transaction._ The keys are MD5 hash codes from random data (16 bytes) and the value is the string "test".
I'm using lmdbjava which currently uses LMDB 0.9.19.
The benchmark is executed on Linux (Ubuntu 17.04 with a 4.10 kernel and a ext4 filesystem).
In the beginning the data is inserted relatively fast:
1M/1000M took:3317 ms (3s 317ms)
Then the insert performance deteriorates gradually. After inserting 642M entries inserting 1M entries takes more than 5 minutes:
642M/1000M took:305734 ms (5m 5s 734ms)
At this time the database size (data.mdb) is about 27GiB. The filesystem buffer cache has about the same value, so I assume most pages are cached. Linux still reports 28G free memory.
A short analysis with perf seems to indicate that most time is spent in mdb_page_spill
Children Self 96,45% 0,00% lmdbjava-native-library.so [.] mdb_cursor_put
96,45% 0,00% lmdbjava-native-library.so [.] mdb_put
96,45% 0,00% jffi8421248145368054745.so (deleted) [.] 0xffff80428d388b3f 60,43% 2,61% lmdbjava-native-library.so [.] mdb_page_spill.isra.16 47,39% 47,39% lmdbjava-native-library.so [.] mdb_midl_sort
26,07% 0,24% lmdbjava-native-library.so [.] mdb_page_touch
26,07% 0,00% lmdbjava-native-library.so [.] mdb_cursor_touch
25,83% 0,00% lmdbjava-native-library.so [.] mdb_page_unspill
23,22% 0,24% lmdbjava-native-library.so [.] mdb_page_dirty
22,99% 22,27% lmdbjava-native-library.so [.] mdb_mid2l_insert
11,14% 0,24% [kernel.kallsyms] [k] entry_SYSCALL_64_fastpath 10,43% 0,47% lmdbjava-native-library.so [.] mdb_page_flush
9,95% 0,00% libpthread-2.24.so [.] __GI___libc_pwrite
9,72% 0,00% [kernel.kallsyms] [k] vfs_write
9,72% 0,00% [kernel.kallsyms] [k] sys_pwrite64
9,48% 0,00% [kernel.kallsyms] [k] generic_perform_write 9,48% 0,00% [kernel.kallsyms] [k] __generic_file_write_iter 9,48% 0,00% [kernel.kallsyms] [k] ext4_file_write_iter
9,48% 0,00% [kernel.kallsyms] [k] new_sync_write
9,48% 0,00% [kernel.kallsyms] [k] __vfs_write
9,24% 0,00% lmdbjava-native-library.so [.] mdb_cursor_set
8,06% 0,47% lmdbjava-native-library.so [.] mdb_page_search
7,35% 0,95% lmdbjava-native-library.so [.] mdb_page_search_root
4,98% 0,24% lmdbjava-native-library.so [.] mdb_page_get.isra.13
The documentation about mdb_page_spill says (as far as I understand) that this function is called to prevent MDB_TXN_FULL situations. Does this mean that my transaction is simply too large to be handled efficiently by LMDB?
Note that a similar benchmark with 4 byte integer keys took only 2h34m for 1000M entries (the integer keys were sorted, but I did not use MDB_APPEND).
I understand LMDB is not write-optimized and maybe my transactions are simply too large. However, I hope I'm just doing something wrong and I can still use LMDB for my use case.
Any ideas?
Thank you,
Juergen