Jon Roberts wrote:
I was just today thinking about something along the lines of filter preprocessing (at the client level actually) that prevented say a contains search like (telephonenumber=*67530*) on an attribute that the directory has not indexed for substring searches (case of telephonenumber). Something at the server level would be better of course.
Something like that was discussed long time ago when I proposed the "limits" feature (which eventually got into slapd in its current form). It's hard to tell what such constraint would mean. However, if one only looks at the presence of a substrings filter in a search, unexpected results may occur; for example:
(telephonenumber=*67530*) => reject
but what about
(!(telephonenumber=*67530*)) => ?
or
(&(uid=foo)(telephonenumber=*67530*)) => ?
A better approach, which we recently developed for a customer, would be to define what filter is to be considered acceptable and what is not, and then analyze the logic of the filter to see if it matches that of the requirement. For example, logic analysis could allow to determine if a filter is surely acceptable, surely unacceptable, or "grey"; then, decision making could determine what to do in the "grey" cases.
If what you want to control is searches resulting in large candidate sets, you need to define what may potentially lead to large candidate sets. So you need to define what's "large", and what simple filters could lead to large candidates sets.
For example: (uid=foo) is likely to lead to a unique candidate (or at most to a few, unless the "unique" overlay is in use), but (objectClass=person) is likely to lead to more than one candidate, and in some case to nearly all the entries in the database. The administrator should define what's the expected behavior of exact matching on a given attribute.
Then, filter analysis would lead to an answer in terms of likelihood of resulting in large candidate sets.
The filter analyzer would need to use rules like:
- presence => large set - equality => small or large set, depending on attribute - appprox => small or large set, depending on attribute - substrings => large - ordering => large
- not => small to large, large to large - and => small if at least one small, large otherwise - or => large if at least one is large
If the rule is: "reject large candidate lists", filter analysis should walk the filter to terminals, stopping as soon as the rule says that the candidate set could be large. If a small set is expected after the filter tree has been entirely traversed, the operation can continue.
It is not simple, and it needs some configuration from the administrator, to guess what equality rules can be assumed to return "small" candidate sets, but it should work.
To make it more general, attributes could be given a weight, or a probability: some equalities may be believed to return small sets, others to return larger sets, so that
(sn=Smith)
may return a considerable set (=> weight 0.1) but (!(cn=Smith)) could return an even larger set (=> weight 1-0.1 = 0.9).
Then, using P() as a "probability operator" [*]:
P(x=v) => admin defined P(x~=v) => admin defined P(x=*v*) => admin defined (based on subsrt length?) but in any case > P(x=v); 0.5 should be reasonable, so that P(!(x=*v*)) is 0.5) P(x>v), P(x<v) => admin defined (based on v?) but in any case > P(x=v) P(not(f)) => 1-P(f) P(and(f1,f2)) => min(P(f1),P(f2)) P(or(f1,f2)) => max(P(f1),P(f2))
and the requirement could be to have P(filter) < threshold.
If anyone wants to implement anything like that, I suggest to either write an overlay that intercepts search operations and analyzes the filter (that's the approach we followed for very specific filter analysis rules), or to extend limits so that they accept run-time loadable rules.
p.
[*] my statistics are quite naive; one could object that the probability of P(and(x,y)) should rather be P(x)*P(y) and so, but I'm just trying to define a set of rules that make sense. Think of them as "fuzzy" rules rather than consistent maths.
Ing. Pierangelo Masarati OpenLDAP Core Team
SysNet s.n.c. Via Dossi, 8 - 27100 Pavia - ITALIA http://www.sys-net.it ------------------------------------------ Office: +39.02.23998309 Mobile: +39.333.4963172 Email: pierangelo.masarati@sys-net.it ------------------------------------------