Here are some notes about how slapd implements a syntax/matching rule. I'm moving this to openldap-devel, it seems to belong there by now.
A slapd search first uses any indexes to reduce the number of entries to examine, and then it checks these entries against the filter.
An index entry contains the indexed value translated to a string which can be compared with simple memcmp(), and a list of entry IDs for the entries that contain that value. So e.g. caseIgnoreMatch translates upper- and lowercase characters to the same character in the index.
If (x < y) implies (memcmp(indexed(x), indexed(y)) < 0), then ORDERING match can use indexing - the 'eq' index. Examples include the 'integer' indexing format since OpenLDAP 2.4.7, and 'generalizedTime'.
Index keys should not be large, so string indexing stores a hash of the indexed value while integer and generalizedTime use binary formats. The integer index format is quite different from normal binary integer formats, to make ORDERING (i.e. memcmp()) work.
You may want to normalize or prettify values received over the protocol before storing them, e.g. store incoming "nan" as "NaN".
For syntax/matching rule examples see schema_init.c - for syntax 'foo': syntax_defs[] DESC 'Foo', mrule_defs[] NAME 'fooMatch','fooOrderingMatch'. fooValidate() checks the syntax of an attribute value. fooMatch() implements the EQUALITY and ORDERING rules. fooIndexer(), fooFilter() make the syntax indexable. fooNormalize() normalizes values, so they can be compared just with memcmp(). fooPretty() translates to a nicer to deal with but not normalized value, e.g. with DN syntax CN=foo,bar ==> cn=foo\2Cbar.
For a non-normalized syntax, look at 'foo'='integer'. For a normalized index, look at 'foo'='generalizedTime'. Note that having normalized the value, the 'generalizedTimeMatch' rule in mrule_defs[] simply uses octetStringMatch to compare instead of a more complex rule. 'generalizedTimeOrderingMatch' cannot do that though.
If you go for a binary attribute syntax like some IEEE format, ask someone else for advise:-) There is also a distinction between whether such a syntax is transferred with the ";binary" option. (I think certificates must and are transferred as ASN.1 BER values. OTOH Octet String can contain any set of bytes, but is not ;binary.)
Regarding a syntax for 'Real':
Look at ITS#745 http://www.openldap.org/its/?findid=745. Nothing came of it, but maybe the author is around.
NaN is a problem. NaN in math does not compare equal to itself, but such behavior would be messy for LDAP values. I haven't tested, but I think you could not have a multi-valued real attribute which contains NaN and some other number: When storing a value, LDAP uses the equality matching rule to check if you are trying to store a duplicate of an existing value.
OTOH I expect realOrderingMatch(NaN, anything)=Undefined is OK, but I haven't tested that either. (Remember that LDAP filters have three-valued logic: A compare can return True, False, Undefined. Or an error, which isn't quite the same.)
If you make it defined, and if current math practice doesn't say anything, I guess NaN should be larger than other values (except Inf?) to match Server Side Sorting (RFC 2891)'s treatment of absent values. OpenLDAP doesn't implement server side sorting, but people keep asking so maybe it will someday.
On Tue, 2008-12-16 at 22:16 +0100, Hallvard B Furuseth wrote:
Here are some notes about how slapd implements a syntax/matching rule. I'm moving this to openldap-devel, it seems to belong there by now.
I'll be monitoring this thread with a lot of interest ... Don't expect too much from me until I get a little more experimented with OpenLDAP and its implementation.
Thanks for the infos ..
I'll be looking for some way of memcmp-arable real scalar encoding to see what's being done in networked games for example ...
LP.
Lorenzo Pastrana - Happy End Vision -------------------------- Design web Conception multimédia Communication visuelle et édition -------------------------- Tél. : 01 42 47 83 09 Fax : 01 47 70 70 19 E-mail : lorenzo.pastrana@happyend.fr
Lorenzo Pastrana writes:
Don't expect too much from me until I get a little more experimented with OpenLDAP and its implementation.
Good luck.
I'll be looking for some way of memcmp-arable real scalar encoding to see what's being done in networked games for example ...
Depending on your needs you could implement it without indexing support, or at least with only EQUALITY indexing support. No need for memcmpable ordering. Then add that later if you find you need to optimize. An index is just an optimization, plus it adds one bit of functionality that I can think of: "sizelimit size.unchecked=<integer>" in slapd.conf.
A bit more reply-to-self to my previous message:
You may want to normalize or prettify values received over the protocol before storing them, e.g. store incoming "nan" as "NaN".
Just to clarify, here I'm talking about the LDAP attribute syntax, not the index format.
For a non-normalized syntax, look at 'foo'='integer'. For a normalized index, look at 'foo'='generalizedTime'.
Sorry, "index" should be "syntax".
Regarding a syntax for 'Real': (...) NaN is a problem. (...)
Duh, obviously the preferred thing as far as LDAP is concerned is to is to copy whatever semantics REAL has in X.500. Or some other ASN.1 floating point type, if there are several to choose from. I don't know if that fits your purpose, of course. How to compare NaN, whether to normalize "1.200" => "1.2", whether to store fixed or arbitrary precision, etc. Likely that says little about the textual syntax though, since X.500 uses and transfers binary types. The X.500 standard is online at http://www.x500standard.com/, but it's pretty huge and depends on ASN.1, which is to be found I-don't-remember where. x500standard@freelists.org may be be best place to ask.
On Wed, 2008-12-17 at 16:08 +0100, Hallvard B Furuseth wrote:
Lorenzo Pastrana writes:
Don't expect too much from me until I get a little more experimented with OpenLDAP and its implementation.
Good luck.
I guess I'll need some, my intuitions are wonderful, reality ...
I'll be looking for some way of memcmp-arable real scalar encoding to see what's being done in networked games for example ...
Depending on your needs you could implement it without indexing support, or at least with only EQUALITY indexing support.
Actually this is where floats are inherently bat at ...
No need for memcmpable ordering.
This is where fixed point reals can come handy ... I don't know what the applicable standard says about real numbers in LDAP but from what you say my bet is fixed-point reals could fit well ...
Lorenzo Pastrana - Happy End Vision -------------------------- Design web Conception multimédia Communication visuelle et édition -------------------------- Tél. : 01 42 47 83 09 Fax : 01 47 70 70 19 E-mail : lorenzo.pastrana@happyend.fr
Lorenzo Pastrana writes:
I'll be looking for some way of memcmp-arable real scalar encoding to see what's being done in networked games for example ...
Depending on your needs you could implement it without indexing support, or at least with only EQUALITY indexing support.
Actually this is where floats are inherently bat at ...
Hmm, true. Yet if you are going to have matching rules at all, you need an EQUALITY rule. (You can have attributes without matching rules, then you can't use them in filters other than presence filters but you can still read them.)
No need for memcmpable ordering.
This is where fixed point reals can come handy ... I don't know what the applicable standard says about real numbers in LDAP but from what you say my bet is fixed-point reals could fit well ...
Well, the LDAP standard doesn't say anything. However LDAP is based on X.500, which has ASN.1 types plus some others.
But there's nothing to stop you from implementing a type with your preferred semantics instead of whatever X.500 says. Except perhaps "whatever X.500 says" is already well thought through, so it might be a good idea for that reason.
Lorenzo Pastrana writes:
Depending on your needs you could implement it without indexing support, or at least with only EQUALITY indexing support.
Actually this is where floats are inherently bat at ...
If you go for floating point instead of fixed point anyway, you could implement an approx matching rule for this. The filter (attr~=val) would search for val*[configurable range, e.g. 0.999999 - 1.000001].
Note that there is no LDAP syntax to assign an attribute an approx matching rule. Instead OpenLDAP ties the approx rules to the EQUALITY matching rules.
If you also want to optimize approx search by supporting indexing, maybe realApproxFilter() could generate 3 index values to check for: (val, val + epsilon, val - epsilon) where epsilon is the smallest significant bit for <val> in the index. It might be a bit hairy to come up with an index format where it's easy to ensure that edge cases work right.
For every possible value, such an index must not have greater precision than the range which the approx filter checks for (i.e. the range 1 +/- epsilon must include the range which the approx filter checks for). That way the indexing will include all candidate entries - possibly plus some false matches which the matching rule will eliminate later during the search.
I haven't tested if slapd allows the fooApproxFilter() functions to generate more than one index key to look up also for approx rules. Substring filters, at least, do generate several index keys. The relevant code seems to be just back-bdb/filterindex.c's use of smr_filter. slapd collects the index keys generated for the search filter, then generates the intersection of the sets of entry IDs for those index keys, and then uses the matching rule to check those entries.
All this is would be unnecessary for single-valued attributes. A client could instead of (attr=4) search for (&(attr>=3.999996)(attr<=4.000004)). However that doesn't work for multi-valued attributes: The second filter would return an entry containing attr: <1.0 and 9.0>.
The same goes in part for ORDERING indexing: An approx rule could generate (attr>=3.999996) and (attr<=4.000004) index keys. (I think you'd need to duplicate the index size to contain both increasing and decreasing keys). That would eliminate some candidate entries, but how many would depend on what kind of entries you have. Given many multi-valued entries with values both below and above a the value you search for, the indexing would return many false matches which slapd would need to check against the matching rule.