Hi
I'm doing some experiments with LMDB trying to emulate a columnar storage database using roaring bitmaps and other tricks.
The initial results are promising, but I ask myself, is a row based storage like LMDB appropriate for implementing a columnar database or are there better, more efficient ways/formats?
Cheers, -Kristoffer
Kristoffer Sjögren wrote:
Hi
I'm doing some experiments with LMDB trying to emulate a columnar storage database using roaring bitmaps and other tricks.
The initial results are promising, but I ask myself, is a row based storage like LMDB appropriate for implementing a columnar database or are there better, more efficient ways/formats?
What makes you think LMDB is either row- or column-based? It has no concept of either, that's purely an abstraction created by higher level code.
My idea is to have one bit index per logical column+value stored in a value where the key also have some means of partitioning the data, maybe over time. So high cardinality columns will generate lots of keys+values.
I was thinking of storing multiple indexes in each key+value, making values bigger but fewer. Say around a few hundreds kilobytes each.
Are there any trade offs between many smaller key+value vs fewer larger ones? I'm more concerned about read performance.
On Sun, Jul 12, 2015 at 12:17 PM, Howard Chu hyc@symas.com wrote:
Kristoffer Sjögren wrote:
Hi
I'm doing some experiments with LMDB trying to emulate a columnar storage database using roaring bitmaps and other tricks.
The initial results are promising, but I ask myself, is a row based storage like LMDB appropriate for implementing a columnar database or are there better, more efficient ways/formats?
What makes you think LMDB is either row- or column-based? It has no concept of either, that's purely an abstraction created by higher level code.
-- -- Howard Chu CTO, Symas Corp. http://www.symas.com Director, Highland Sun http://highlandsun.com/hyc/ Chief Architect, OpenLDAP http://www.openldap.org/project/
Kristoffer Sjögren wrote:
My idea is to have one bit index per logical column+value stored in a value where the key also have some means of partitioning the data, maybe over time. So high cardinality columns will generate lots of keys+values.
I was thinking of storing multiple indexes in each key+value, making values bigger but fewer. Say around a few hundreds kilobytes each.
Are there any trade offs between many smaller key+value vs fewer larger ones? I'm more concerned about read performance.
There would be a speed advantage to using fewer keys+larger values. Search performance is O(logN) where N is the number of keys...
openldap-technical@openldap.org