Question from a LMDB user

3 Nov 2015


      Hi Sir/Madam,
Recently I'm trying to use LMDB to store and randomly acess large amount 
of features. Each feature blob is 16kB.
Before trying LMDB, I just stack all the features together into one huge 
binay file, and use seek function in C++ to access each feature. Since 
the feature size is fixed, I can easily compute the address of each 
feature in the file.
Then I tried LMDB. The value is the feature as it is. The key is "1", 
"2", "3", .... Since 16kB is exactly 4 x page_size, adding the key and 
header, each feature will occupy 5 x page_size, so the db file on disk 
is about 1.25 times of the previous binary file, this is already a 
disadvantage for LMDB, but I still hope there can be some efficiency 
trade-off. I use LDMB++ C++ wrapper to access features.
Next, I compared two approach by accessing the same random 1% features 
from about 300k features. Before the test, I use vmtouch to evict both 
files from memory cache. The result is surprising. The one use LMDB is 
1.5 times slower than the raw binary file (30s vs 20s).
Is this because the size of feature (exactly 4 pages)? Do I understand 
the use of LMDB incorrectly?
Thank your for your time!
Best Regards,
Tao Chen

2025

2024

2023

2022

2021

2020

2019

2018

2017

2016

2015

2014

2013

2012

2011

2010

2009

2008

2007

Question from a LMDB user