Hello!
I have been looking at the LMDB project code trying to learn and understand how a database could be implemented; however, I have been struggling to answer the questions below for quite some time, partially due to my limited knowledge of C:
- How does LMDB store keys and values in the page? I have learned that a page consists of the header, slots, and key-value pairs but how does the database handle keys and values that are too large to fit within a page?
- I found that LMDB does not keep a fixed number of keys per page, so that would depend on the key-value pairs' sizes already inserted into the page. Is this correct? Does it mean that B+tree pages (branch or leaf) could have a different number of keys depending on the key size? How does it affect performance or implementation of the B+tree?
- Are there any limitations on the size of a key? Can it be of an arbitrary length?
- What is the difference between IS_LEAF and IS_LEAF2 flags in the page header? What is the difference between these pages?
- How do overflow pages work in LMDB? From what I could understand, if a key or a value does not fit in the page, it will be stored in the overflow page (the entire page is allocated for that specific key or value). Is this correct? What happens when the key size is several times larger than the page size, e.g. 1MB value with 4KB pages?
- What is a sub-page in LMDB (F_SUBDATA)? How does it work?
I would greatly appreciate it if someone could share links to the documentation that covers internals of the database, online videos, research papers, mailing lists, or any notes you could share to help me understand the above. Thank you very much!
Cheers, Ivan