I'm adding counters to the Attribute and Modification structures, a_numvals and sm_numvals to track the number of values in each. I've added an attr_valadd() routine that will add arrays of values to an attribute and update the counter. This can eliminate most of the empty for loops we're using to count these values at various points. But it means other code that assembles Attributes manually needs to be updated. Fortunately most of these are easy to spot now since they generally have to use attr_alloc(). I believe I have them all.
This is the first stage in a larger transition, to using sorted arrays for multivalued attributes. I haven't quite decided how the next steps will play out just yet.
Note that we already do a sort on incoming attributes, but that code currently has ORDER_PRESERVING defined so after duplicate checks are done we discard the sort info. That preserves the existing/historical slapd behavior.
However, I'm pretty sure this is going to have to change. Using sorted arrays will allow us to use binary search to find values, which will be a big win for attributes with very many values. This will speed up test_filter, Compare, ACL processing, etc. etc...
We can also implement this without any user-visible change, if we keep an auxiliary ordering array, the way the current sorter does. But it seems to me that that will be too much memory overhead. It also makes add/remove that much more tedious...
Howard Chu writes:
This is the first stage in a larger transition, to using sorted arrays for multivalued attributes. I haven't quite decided how the next steps will play out just yet. (...) We can also implement this without any user-visible change, if we keep an auxiliary ordering array, the way the current sorter does. But it seems to me that that will be too much memory overhead. It also makes add/remove that much more tedious...
Add a slapd.conf directive which inserts an implicit X-ORDERED 'VALUES' in an already existing attribute definition. Then admins can configure the old behavior for the attributes where he needs it, including in preexisting schema. (Could support @objectclass and !objectclass to list multiple attributes, or a regex matching the attribute name.)
Hallvard B Furuseth wrote:
Howard Chu writes:
This is the first stage in a larger transition, to using sorted arrays for multivalued attributes. I haven't quite decided how the next steps will play out just yet. (...) We can also implement this without any user-visible change, if we keep an auxiliary ordering array, the way the current sorter does. But it seems to me that that will be too much memory overhead. It also makes add/remove that much more tedious...
Add a slapd.conf directive which inserts an implicit X-ORDERED 'VALUES' in an already existing attribute definition. Then admins can configure the old behavior for the attributes where he needs it, including in preexisting schema. (Could support @objectclass and !objectclass to list multiple attributes, or a regex matching the attribute name.)
OK, that makes sense. Except it's not the same as X-ORDERED 'VALUES' behavior. The X-ORDERED properties allow you to define an arbitrary static ordering. This is just a straight memcmp-based ordering (or similar, most likely length-first).
I don't think objectclass shorthand makes sense here. It would be pretty unusual to need this feature on e.g. all the attributes of inetOrgPerson.
This would be somewhat like integrating the valsort overlay features into the mainline (but without the weighted option).
Howard Chu writes:
Add a slapd.conf directive which inserts an implicit X-ORDERED 'VALUES' in an already existing attribute definition. (...)
OK, that makes sense. Except it's not the same as X-ORDERED 'VALUES' behavior. (...)
Yes, I guess that should have been the opposite, an 'X-PRESERVE-ORDER' thing, to tell slapd to not reorder values. (I have a distinct feeling I've made that exact mistake before...)
I don't think objectclass shorthand makes sense here. It would be pretty unusual to need this feature on e.g. all the attributes of inetOrgPerson.
True, but it can be useful to define object classes that are only to be used in config directives, not in entries.
Hallvard B Furuseth wrote:
Howard Chu writes:
Add a slapd.conf directive which inserts an implicit X-ORDERED 'VALUES' in an already existing attribute definition. (...)
OK, that makes sense. Except it's not the same as X-ORDERED 'VALUES' behavior. (...)
Yes, I guess that should have been the opposite, an 'X-PRESERVE-ORDER' thing, to tell slapd to not reorder values. (I have a distinct feeling I've made that exact mistake before...)
I don't think objectclass shorthand makes sense here. It would be pretty unusual to need this feature on e.g. all the attributes of inetOrgPerson.
True, but it can be useful to define object classes that are only to be used in config directives, not in entries.
Sounds like you want the X.500 attribute-sets feature. Though I suppose there's no harm in subverting abstract objectclasses for this purpose.
Hallvard B Furuseth wrote:
Howard Chu writes:
I don't think objectclass shorthand makes sense here. It would be pretty unusual to need this feature on e.g. all the attributes of inetOrgPerson.
True, but it can be useful to define object classes that are only to be used in config directives, not in entries.
I've added a sortvals config directive; for now it just takes a list of attributes. All of the current tests pass, but I haven't added any tests for this directive yet.
The only interesting test (for me) is operating on a groupOfNames with a couple hundred thousand members. Other folks may have other scenarios to try.
Howard Chu wrote:
The only interesting test (for me) is operating on a groupOfNames with a couple hundred thousand members. Other folks may have other scenarios to try.
On my machine with the CPU speed set to 800 MHz a search of the form ldapsearch ... (&(objectclass=groupofnames)(member=uid=xxxx...)) takes about 0.36 seconds without sortvals defined, referencing a group with 113,000 members. The same search with "sortvals member" defined takes 0.15 seconds. This is with an eq index on objectclass alone.
Running the CPU at full speed the before/after is 0.11 seconds vs 0.05.
These figures don't tell the whole story though...
Using a filter of the form (&(objectclass=groupofnames)(!(member=foo)(member=bar))) the results are 0.55 seconds before, and 0.16 seconds after. Likewise at full CPU speed, the result is 0.17 seconds before and 0.05 seconds after. So while the absolute difference is pretty small, you can see that the overhead of processing the (member=) filter clause has gone pretty much to zero.