LMDB set range to return less than or equal key - openldap-technical

21 Feb 2015


      Hi!
In my use case I have found that MDB_SET_RANGE and MDB_GET_BOTH_RANGE
queries are much more useful, in 90+% cases, when they return less than or
equal key, not greater or equal.
I have tested a quick implementation of "get less or equal", and it is
c.50% slower in memory. I must say that if someone told me, before I
learned about LMDB, that even with 50% less than original performance such
numbers are possible, I would not believe - both are millions ops per
seconds in my isolated in-memory microbenchmark. But still, such queries
are building blocks of N operations that could require non-linear (N^x, x >
1) number of queries and I am interested if I could easily speed this up
closer to the built-in MDB_SET_RANGE. The C code is in this gist
https://gist.github.com/buybackoff/59dabc70fa5ec8351bbe#file-mdb_cursor_get_le.
I call it from C# via P/Invoke, the test code is here
https://gist.github.com/buybackoff/59dabc70fa5ec8351bbe#file-c-test-via-pinvoke.
I made sure that only C methods are different, all other code is the same
in C#. But even with the P/Invoke overhead the difference is visible.
My naive approach is to use MDB_SET_RANGE as the first step and, if the key
from this query is not equal to the requested key, to call MOVE_PREV. Other
than an additional call to MOVE_PREV, it requires allocation of a copy of
the original key, because MDB_SET_RANGE overwrites it and I haven't found a
better way to compare the requested key with the key returned after
MDB_SET_RANGE. My experience with C (without # or ++) is 4 days less than 2
month, LMDB was the major reason I started, so my code could be really
stupid...
If it is not, are there other options? E.g. to recompile LMDB for "less or
equal" as default behavior. I have tried to change this code inside
`mdb_node_search`:
if (rc *>* 0) { /* Found entry is *less than* the key. */
i*++*; /* Skip to get the *smallest entry larger than* key. */
if (!IS_LEAF2(mp))
node = NODEPTR(mp, i);
}
to this code:
if (rc *<* 0) { /* Found entry is greater than the key. */
i*--*; /* Skip to get the *largest entry smaller than* key. */
if (!IS_LEAF2(mp))
node = NODEPTR(mp, i);
}
but it fails badly. Is "greater or equal" approach deeply rooted in the
design of LMDB, or I have missed just a small number of other places and it
is feasible to change the behavior globally? If that is possible, I would
like to do so and to use my current approach for "greater or equal" instead.
I cannot test this quickly on a larger-that-RAM data set (because I
struggle with my 128Gb laptop drive). But as far as I understand, in most
cases MDB_SET_RANGE will load the page with a previous key in RAM - usually
this is the same page or one that has been touched during the search? So in
the on-disk scenario the incremental MOVE_PREV will be negligible and I
should not worry about that?
Is there another, completely different and better, way to perform "less
than or equal" query?
Thanks!
Victor