Syntax/matching rule implementation - floating point numbers - openldap-devel

16 Dec 2008


      Here are some notes about how slapd implements a syntax/matching rule.
I'm moving this to openldap-devel, it seems to belong there by now.
A slapd search first uses any indexes to reduce the number of entries
to examine, and then it checks these entries against the filter.
An index entry contains the indexed value translated to a string which
can be compared with simple memcmp(), and a list of entry IDs for the
entries that contain that value.  So e.g. caseIgnoreMatch translates
upper- and lowercase characters to the same character in the index.
If (x < y) implies (memcmp(indexed(x), indexed(y)) < 0), then ORDERING
match can use indexing - the 'eq' index.  Examples include the 'integer'
indexing format since OpenLDAP 2.4.7, and 'generalizedTime'.
Index keys should not be large, so string indexing stores a hash of the
indexed value while integer and generalizedTime use binary formats.
The integer index format is quite different from normal binary integer
formats, to make ORDERING (i.e. memcmp()) work.
You may want to normalize or prettify values received over the protocol
before storing them, e.g. store incoming "nan" as "NaN".
For syntax/matching rule examples see schema_init.c - for syntax 'foo':
syntax_defs[] DESC 'Foo', mrule_defs[] NAME 'fooMatch','fooOrderingMatch'.
fooValidate() checks the syntax of an attribute value.
fooMatch()    implements the EQUALITY and ORDERING rules.
fooIndexer(), fooFilter() make the syntax indexable.
fooNormalize() normalizes values, so they can be compared just with
              memcmp().
fooPretty()   translates to a nicer to deal with but not normalized
              value, e.g. with DN syntax CN=foo,bar ==> cn=foo\2Cbar.
For a non-normalized syntax, look at 'foo'='integer'.
For a normalized index, look at 'foo'='generalizedTime'.
Note that having normalized the value, the 'generalizedTimeMatch' rule
in mrule_defs[] simply uses octetStringMatch to compare instead of a
more complex rule. 'generalizedTimeOrderingMatch' cannot do that though.
If you go for a binary attribute syntax like some IEEE format, ask
someone else for advise:-)  There is also a distinction between whether
such a syntax is transferred with the ";binary" option.  (I think
certificates must and are transferred as ASN.1 BER values.  OTOH
Octet String can contain any set of bytes, but is not ;binary.)
Regarding a syntax for 'Real':
Look at ITS#745 http://www.openldap.org/its/?findid=745.  Nothing
came of it, but maybe the author is around.
NaN is a problem.  NaN in math does not compare equal to itself, but
such behavior would be messy for LDAP values.  I haven't tested, but I
think you could not have a multi-valued real attribute which contains
NaN and some other number: When storing a value, LDAP uses the equality
matching rule to check if you are trying to store a duplicate of an
existing value.
OTOH I expect realOrderingMatch(NaN, anything)=Undefined is OK, but I
haven't tested that either.  (Remember that LDAP filters have
three-valued logic: A compare can return True, False, Undefined.  Or
an error, which isn't quite the same.)
If you make it defined, and if current math practice doesn't say
anything, I guess NaN should be larger than other values (except Inf?)
to match Server Side Sorting (RFC 2891)'s treatment of absent values.
OpenLDAP doesn't implement server side sorting, but people keep asking
so maybe it will someday.
-- 
Hallvard