Hello!
I have been looking at the LMDB project code trying to learn and understand
how a database could be implemented; however, I have been struggling to
answer the questions below for quite some time, partially due to my limited
knowledge of C:
- How does LMDB store keys and values in the page? I have learned that a
page consists of the header, slots, and key-value pairs but how does the
database handle keys and values that are too large to fit within a page?
- I found that LMDB does not keep a fixed number of keys per page, so that
would depend on the key-value pairs' sizes already inserted into the page.
Is this correct? Does it mean that B+tree pages (branch or leaf) could have
a different number of keys depending on the key size? How does it affect
performance or implementation of the B+tree?
- Are there any limitations on the size of a key? Can it be of an arbitrary
length?
- What is the difference between IS_LEAF and IS_LEAF2 flags in the page
header? What is the difference between these pages?
- How do overflow pages work in LMDB? From what I could understand, if a
key or a value does not fit in the page, it will be stored in the overflow
page (the entire page is allocated for that specific key or value). Is this
correct? What happens when the key size is several times larger than the
page size, e.g. 1MB value with 4KB pages?
- What is a sub-page in LMDB (F_SUBDATA)? How does it work?
I would greatly appreciate it if someone could share links to the
documentation that covers internals of the database, online videos,
research papers, mailing lists, or any notes you could share to help me
understand the above. Thank you very much!
Cheers,
Ivan