Pierangelo Masarati wrote:
Jon Roberts wrote:
I was just today thinking about something along the lines of filter preprocessing (at the client level actually) that prevented say a contains search like (telephonenumber=*67530*) on an attribute that the directory has not indexed for substring searches (case of telephonenumber). Something at the server level would be better of course.
Something like that was discussed long time ago when I proposed the "limits" feature (which eventually got into slapd in its current form). It's hard to tell what such constraint would mean. However, if one only looks at the presence of a substrings filter in a search, unexpected results may occur; for example:
(telephonenumber=*67530*) => reject
but what about
(!(telephonenumber=*67530*)) => ?
or
(&(uid=foo)(telephonenumber=*67530*)) => ?
A better approach, which we recently developed for a customer, would be to define what filter is to be considered acceptable and what is not, and then analyze the logic of the filter to see if it matches that of the requirement. For example, logic analysis could allow to determine if a filter is surely acceptable, surely unacceptable, or "grey"; then, decision making could determine what to do in the "grey" cases.
If what you want to control is searches resulting in large candidate sets, you need to define what may potentially lead to large candidate sets. So you need to define what's "large", and what simple filters could lead to large candidates sets.
OK, so you want to prevent candidate generation to occur for filter terms which might result in large candidate sets. First of all, assuming that that's even a valid thing to do (noting your issues listed above) I would just define a new limit analogous to sizelimit.unchecked, and skip the probability guessing games. E.g. sizelimit.intermediate which would be checked at intermediate stages of filter evaluation. That would render sizelimit.unchecked moot.
The implementation would apply this limit to each individual filter term lookup, and fail with ADMINLIMIT_EXCEEDED when any term exceeds the limit.
In practice I think this will cause a lot of harm though; it will cause ANDed filters to fail that would otherwise come in under the unchecked limit.