Hi Sir/Madam,
Recently I'm trying to use LMDB to store and randomly acess large amount of features. Each feature blob is 16kB. Before trying LMDB, I just stack all the features together into one huge binay file, and use seek function in C++ to access each feature. Since the feature size is fixed, I can easily compute the address of each feature in the file.
Then I tried LMDB. The value is the feature as it is. The key is "1", "2", "3", .... Since 16kB is exactly 4 x page_size, adding the key and header, each feature will occupy 5 x page_size, so the db file on disk is about 1.25 times of the previous binary file, this is already a disadvantage for LMDB, but I still hope there can be some efficiency trade-off. I use LDMB++ C++ wrapper to access features.
Next, I compared two approach by accessing the same random 1% features from about 300k features. Before the test, I use vmtouch to evict both files from memory cache. The result is surprising. The one use LMDB is 1.5 times slower than the raw binary file (30s vs 20s).
Is this because the size of feature (exactly 4 pages)? Do I understand the use of LMDB incorrectly? Thank your for your time!
Best Regards,
Tao Chen