Ivan Unknown wrote:
Hello!
I have been looking at the LMDB project code trying to learn and understand how a database could be implemented; however, I have been struggling to answer the questions below for quite some time, partially due to my limited knowledge of C:
- How does LMDB store keys and values in the page? I have learned that a page consists of the header, slots, and key-value pairs but how does the database
handle keys and values that are too large to fit within a page?
That is already documented in the code and Doxygen.
- I found that LMDB does not keep a fixed number of keys per page, so that would depend on the key-value pairs' sizes already inserted into the page. Is this
correct?
Correct. This is a major difference from textbook Btree or B+tree implementations, but it is essential for good storage utilization.
Does it mean that B+tree pages (branch or leaf) could have a different number of keys depending on the key size?
Yes.
How does it affect performance or implementation of the B+tree?
No particular impact.
- Are there any limitations on the size of a key?
Yes.
Can it be of an arbitrary length?
No. There is work underway to remove length limits on keys in LMDB 1.0 but that feature isn't working yet.
- What is the difference between IS_LEAF and IS_LEAF2 flags in the page header? What is the difference between these pages?
That is already documented in the code and Doxygen.
- How do overflow pages work in LMDB? From what IĀ could understand, if a key or a value does not fit in the page, it will be stored in the overflow page (the
entire page is allocated for that specific key or value). Is this correct?
Yes for values. Not for keys since they have a max length smaller than a page.
What happens when the key size is several times larger than the page size, e.g. 1MB value with 4KB pages?
That is already documented in the code and Doxygen.
- What is a sub-page in LMDB (F_SUBDATA)? How does it work?
That is already documented in the code and Doxygen.
I would greatly appreciate it if someone could share links to the documentation that covers internals of the database, online videos, research papers, mailing lists, or any notes you could share to help me understand the above. Thank you very much!
Doxygen docs are embedded in the source code already. You can format them using the doxygen tool.
Other info is linked at https://www.symas.com/symas-lmdb-tech-info
Cheers, Ivan