Am 21.03.2013 21:58, schrieb Howard Chu:
Tobias Oberstein wrote:
Hello,
I have read the - very interesting - performance comparison http://symas.com/mdb/microbench/
I'd like to ask if someone did benchmark LMDB (and/or the others) against http://www.garret.ru/fastdb.html
FastDB is an in-memory ACID database that works via shadow paging, and without a transaction log.
OK, like LMDB it uses shadow root pages. I think the similarity ends there.
Ah. Ok.
It is a relational database with an ASCII query language, while LMDB is strictly a key/value store. That automatically means for simple get/put operations LMDB will be orders of magnitude faster (just as it is so much faster than SQLite3 and SQLite4).
Mmh. The overhead of parsing a "SELECT value FROM kv WHERE key = ?" or executing a prepared version of the former versus a direct "kv.get(key)" is there, sure, but _orders_ of magnitude larger?
It makes the flawed assumption that many main-memory databases do, that all RAM is the same speed. It uses T-trees as its underlying data structure. LMDB uses B+trees because time and again research shows that B+trees are more efficient, even when all of the data fits into main memory. This is the same mistake the MemSQL guys make, as well as MySQL NDBCluster.
See these papers, for example: http://resnet.uoregon.edu/~gurney_j/jmpc/skiplist.html http://www.vldb.org/dblp/db/conf/vldb/RaoR99.html
The reality is that computer architectures are hierarchical in nature. A CPU has L1, L2, and possibly L3 cache, each of which is progressively slower than the previous. Then it gets to local RAM, which is slower still. Then in a multiprocessor machine, there is NUMA, memory residing on remote nodes, which is again slower still. B+trees are inherently hierarchical and naturally suited to the reality of system design. All other approaches, including the recently popular LSM trees (as used in e.g. Google LevelDB, SQLite4, and various other recent databases) are all inherently inferior because of this fundamental fact of computer architecture.
Makes sense .. main-memory access patterns optimized for locality will have an advantage vs patterns that assume O(1) for any (random) pattern.
Btw: could LMDB be used as a backend of sqlite4? I.e. does LMDB support ordered access?
From the sqlite4 docs:
"SQLite4 needs to be able to seek to the nearest key for a given probe key, then step through keys in lexicographical order in either the ascending or the descending direction. "
.. where ordering is:
"Keys sort in lexicographical order (as if sorted using the memcmp() library function). When one key is a prefix of another, the shorter key occurs first."
FastDB also appears to use locking, while LMDB is MVCC and readers require no locks, so even with all of the other disadvantages out of the way, LMDB will scale better across multiple CPUs.
Ah, cool. This is definitely a very good thing: no locks, and writers don't block readers.
FastDB is also written in C++ which is another inherent disadvantage compared to LMDB which is pure C.
Yes, though I'm a C++ fan, I agree with this point. E.g., here is a nice Python wrapper
https://github.com/dw/py-lmdb/
which interfaces using Cython and wouldn't be possible if LMDB would be C++.
You could adapt the LevelDB microbenchmarks to test it but ultimately I believe it would be a waste of time.
Thanks for your detailed answer and sharing information! It seems LMDB deserves much more visibility in the community.
- Tobias