Michael Ströder writes:
We all know the following messages in syslog (loglevel stats):
mdb_equality_candidates: (foo) not indexed
At first glance this seems helpful to find indexing issues.
But IMO
- this is somewhat mis-leading. regarding performance tuning and
True. But it's also very valuable for identifying what needs to be indexed, and it does not look easy to delay the message until we know if it might be useful.
We could clarify the doc, which some people even read, and maybe reword the message.
- if internal searches are conducted (e.g. by set-based ACLs) the
amount of the very same indexing warnings is really annoying and costs performance due to excessive logging.
We can add a "none" indexing level with no effect other than to shut up that warning. I've been thinking of it before, but never got around to coding it.
AFAIK a set of search candidates is derived from filter assertions by first searching the indexed attributes.
When that looks useful, yes - like an AND filter. The index narrows down the possible candidate entries: The server takes the intersection of the candidate set returned by baseDN/scope and indexed attrs.
There are some implicit "filters" too: I mentioned baseDN/scope, which works a bit differently. Also filtering for objectClass, see below.
Then the non-indexed assertions are tested but only on the search candidate set. Is this correct?
The full filter is tested, since the indexes are inaccurate. Typically it is a hash of the attribute values which is indexed. (Which doesn't imply the index is implemented as a hash table, BTW.)
Yes, for each indexed attribute, an AND filter looks up the set of entry IDs matching the assertion value. Then the server can take the intersection of these.
If yes, then indexing an attribute which is present in many entries can lead to large search candidate set even though the amount of final search results are small.
Yes.
Consider the following simple example:
(&(objectClass=posixAccount)(uid=foo))
Bad example. The manual says objectClass should be indexed for performance. This is because the server may turn your (FILTER) into
(|(FILTER)(objectClass=alias)(objectClass=referral))
...depending on your search parameters. If there are aliases or referrals in the search scope, the server doesn't know if the entries they refer to match (FILTER). So it has to find and follow every alias in the scope to check, and return all referrals in scope. That's also why it's a bad idea to have lots of aliases.
You are roughly right for other attributes than objectClass, though.
With lots of user accounts this would lead to two search candidate sets, one very large and one with one entry (assuming uid values are unique). Not indexing objectClass would one result in *one* search candidate. So indexing objectClass might not be very wise.
More accurately, it leads to just one candidate *set*. Plus the implicit DN/scope candidate set. So there will be fewer candidate sets which the server needs to take an intersection of.