Tao Chen wrote:
Recently I'm trying to use LMDB to store and randomly acess large amount of
features. Each feature blob is 16kB.
Before trying LMDB, I just stack all the features together into one huge binay
file, and use seek function in C++ to access each feature. Since the feature
size is fixed, I can easily compute the address of each feature in the file.
Then I tried LMDB. The value is the feature as it is. The key is "1",
"3", .... Since 16kB is exactly 4 x page_size, adding the key and header, each
feature will occupy 5 x page_size, so the db file on disk is about 1.25 times
of the previous binary file, this is already a disadvantage for LMDB, but I
still hope there can be some efficiency trade-off. I use LDMB++ C++ wrapper to
Next, I compared two approach by accessing the same random 1% features from
about 300k features. Before the test, I use vmtouch to evict both files from
memory cache. The result is surprising. The one use LMDB is 1.5 times slower
than the raw binary file (30s vs 20s).
Is this because the size of feature (exactly 4 pages)?
That certainly doesn't help, given the 16 byte page header. We expect to
remove this page header on overflow pages in LMDB 1.0.
Do I understand the use
of LMDB incorrectly?
You are comparing a B+tree which has complexity O(logN) to a direct access
with complexity of O(1). The result you got is exactly as expected.
There are only 2 reasons to use a tree structure:
1) you will have frequent inserts/deletes from the data set.
2) your data sizes are variable or unknown.
Your experiment uses a constant array, so reason 1 is invalid. And all of your
records are identical size, so reason 2 is invalid.
This is basic computer science, nothing special about LMDB.
Thank your for your time!
-- Howard Chu
CTO, Symas Corp. http://www.symas.com
Director, Highland Sun http://highlandsun.com/hyc/
Chief Architect, OpenLDAP http://www.openldap.org/project/