Hello.
I have some non-OpenLDAP-related LMDB questions and suggestions. I hope this is the correct list for them. If not, please point to a better venue.
It seems that MDB_cmp_func is a plain binary function without a context parameter. This makes it impossible to tune a comparison operation at runtime (without libffi magic, anyway). In particular, it makes it difficult to create bindings for the set_compare operation in other languages.
Funnily enough MDB_rel_func _does_ have a context pointer, even though it is not used yet. So would it be feasible to add a similarly contextful version of the comparison function? (Arguably these two could share the context pointer, it could just be mdb_dbi_set_userctx.)
Although the documentation in lmdb.h is generally good, it's a bit wishy-washy when it comes to lifetimes of data. The mdb_get documentation only says: "The memory pointed to by the returned values is owned by the database". My understanding is that the pointers to the key/data pair retrieved from the database are valid for the duration of the current transaction. After the transaction is over, the page in which the key/data pair was located may be reclaimed for other uses. If this is correct, could it be reflected in the documentation a bit more explicitly?
The current behavior is a bit inconsistent, though, in that depending on the operation, mdb_cursor_get may or may not update the `key` argument to point to the key in the database. It would be much clearer if after the call it _always_ pointed to the key in the database, even for operations (like MDB_SET_KEY) where the value of the key would be the same as the input argument.
Finally, what is the recommended way of dealing with composite keys? Suppose I want to look up a value associated to a pair of keys (say, a directory id and a filename). There are several ways of doing this:
* Encode the pair (key1, key2) into a single key, use that normally.
* Use MDB_DUPSORT, encode each (key2, value) as data for a duplicate key key1. Make sure the encoding is such that the data are sorted by key2. Then use MDB_GET_BOTH_RANGE to look up the nearest entry for a given (key1, key2) pair and check it actually has the correct key2.
* Create a named database for each key1, store its name as the data for key1. Store normal (key2, value) pairs in the named database,
Each of these has its problems. The first is somewhat wasteful, especially if, say, key1 is long and often repeated. The second is what I use, but it feels a bit kludgy: I'm supposed to use a key-value store yet I still have to manually decode and encode key-value pairs to and from their constituents. The third has lots of overhead if there are very few entries for a given key1, and it requires us to manually manage the lifetimes of the named databases.
My understanding is that MDB_DUPSORT internally does something close to the third option (but with optimizations to avoid wasting pages for just a few entries), but just using empty data items in the key-specific subdatabase. Would it then be feasible to add direct support for composite keys by allowing non-empty data there?
Thanks,
Lauri Alanko la@iki.fi
Lauri Alanko wrote:
Hello.
I have some non-OpenLDAP-related LMDB questions and suggestions. I hope this is the correct list for them. If not, please point to a better venue.
This is explicitly documented as the correct list. https://gitorious.org/mdb/mdb/source/3368d1f5e243225cba4d730fba19ff600798ebe...:
It seems that MDB_cmp_func is a plain binary function without a context parameter. This makes it impossible to tune a comparison operation at runtime (without libffi magic, anyway). In particular, it makes it difficult to create bindings for the set_compare operation in other languages.
It would be foolish to implement the comparator in a non-native compiled language.
Funnily enough MDB_rel_func _does_ have a context pointer, even though it is not used yet. So would it be feasible to add a similarly contextful version of the comparison function? (Arguably these two could share the context pointer, it could just be mdb_dbi_set_userctx.)
Anything is feasible but this will never be done. The comparator is called a huge number of times in any DB operation. Adding an additional parameter to the calling sequence causes a significant, measurable slowdown.
Adding an optional API that other languages can use will bloat the code and cause the C code to slow down. That would be unacceptable.
The comparator is perhaps the single most performance-sensitive code in the entire LMDB library. Anything that slows it down will be rejected outright. (As your sugggestion has been, a few times already.)
Although the documentation in lmdb.h is generally good, it's a bit wishy-washy when it comes to lifetimes of data. The mdb_get documentation only says: "The memory pointed to by the returned values is owned by the database". My understanding is that the pointers to the key/data pair retrieved from the database are valid for the duration of the current transaction. After the transaction is over, the page in which the key/data pair was located may be reclaimed for other uses. If this is correct, could it be reflected in the documentation a bit more explicitly?
That is basically correct for read-only transactions. It is not necessarily true for read-write transactions, since you can contrive a sequence of ops in a single transaction that causes a memory page to be allocated and subequently deleted (using a particular sequences of puts/deletes).
Areas of the documentation that are under-specified are intentionally so. API-level documentation does not and should not expose such details of the underlying implementation. API users should not rely on such details. Overall the doc tells you to avoid long-lived transactions. That also means use the data as soon as you retrieve it and then move on.
The current behavior is a bit inconsistent, though, in that depending on the operation, mdb_cursor_get may or may not update the `key` argument to point to the key in the database. It would be much clearer if after the call it _always_ pointed to the key in the database, even for operations (like MDB_SET_KEY) where the value of the key would be the same as the input argument.
The current behavior is precisely documented. MDB_SET does not return the key, MDB_SET_KEY does. If you want the key to always be returned, use MDB_SET_KEY and forget about MDB_SET.
Finally, what is the recommended way of dealing with composite keys? Suppose I want to look up a value associated to a pair of keys (say, a directory id and a filename). There are several ways of doing this:
Encode the pair (key1, key2) into a single key, use that normally.
Use MDB_DUPSORT, encode each (key2, value) as data for a duplicate key
key1. Make sure the encoding is such that the data are sorted by key2. Then use MDB_GET_BOTH_RANGE to look up the nearest entry for a given (key1, key2) pair and check it actually has the correct key2.
- Create a named database for each key1, store its name as the data for
key1. Store normal (key2, value) pairs in the named database,
Each of these has its problems. The first is somewhat wasteful, especially if, say, key1 is long and often repeated. The second is what I use, but it feels a bit kludgy: I'm supposed to use a key-value store yet I still have to manually decode and encode key-value pairs to and from their constituents. The third has lots of overhead if there are very few entries for a given key1, and it requires us to manually manage the lifetimes of the named databases.
My understanding is that MDB_DUPSORT internally does something close to the third option (but with optimizations to avoid wasting pages for just a few entries), but just using empty data items in the key-specific subdatabase. Would it then be feasible to add direct support for composite keys by allowing non-empty data there?
That was planned in the very beginning, but later dropped because we didn't have a use case for it.
You should just use MDB_DUPSORT and a custom dup comparator that ignores the value part that you appended to the key.
Quoting Howard Chu hyc@symas.com:
This is explicitly documented as the correct list. https://gitorious.org/mdb/mdb/source/3368d1f5e243225cba4d730fba19ff600798ebe...:
Ah, great. The page at symas.com/mdb was a bit more ambiguous.
It would be foolish to implement the comparator in a non-native compiled language.
Certainly when performance is paramount. But that is not the only reason someone might prefer to use LMDB.
Whatever the language, sometimes a comparison operation might depend on run-time data: for instance, case-insensitive string comparison is locale-dependent, so if you don't want to hard-code a fixed list of supported locales into the program, you need some way of passing the locale to the comparator function.
Anything is feasible but this will never be done. The comparator is called a huge number of times in any DB operation. Adding an additional parameter to the calling sequence causes a significant, measurable slowdown.
Seriously? That strikes against both my intuition and some quick benchmarks.
The intuition: if the binary search loop in mdb_node_search is so tight that function call overhead for the comparison is significant, then I would expect the indirect branch to be a bigger part of that than argument passing. After all, in the absence of register pressure, passing an argument is pretty much just a single move instruction. So if this was performance-critical, I would expect to see a fast-path specialization of the search loop for mdb_cmp_memn. That would allow the most common comparison to be inlined. But if that hasn't been worthwhile, then skimping with the parameters (at a great cost to flexibility) seems even less so.
The benchmark: I tried out both modifications (both fastpath memcmp and a third parameter to the comparison function) with db_bench_mdb. I admittedly know nothing about benchmarking databases, but I was unable to see any meaningful differences with any of the modifications. So if there is some evidence that the third parameter would make a difference, I'd be interested in learning of it.
Although the documentation in lmdb.h is generally good, it's a bit wishy-washy when it comes to lifetimes of data.
Sorry, I take this back. This was a pet peeve of mine when I first learned LMDB, but after that a note was added to mdb_get:
* @note Values returned from the database are valid only until a * subsequent update operation, or the end of the transaction.
This is perfectly sufficient (although it should probably be attached to mdb_cursor_get as well), but somehow I managed to miss it ever since. My bad.
Lauri
openldap-technical@openldap.org