indexing warning considered harmful

List overview All Threads
Download

newer

older

Ldapdelete

slapadd question

Michael Ströder

27 Jan 2014 27 Jan '14

4:47 a.m.

HI!

We all know the following messages in syslog (loglevel stats):

mdb_equality_candidates: (foo) not indexed

At first glance this seems helpful to find indexing issues.

But IMO 1. this is somewhat mis-leading. regarding performance tuning and 2. if internal searches are conducted (e.g. by set-based ACLs) the amount of the very same indexing warnings is really annoying and costs performance due to excessive logging.

AFAIK a set of search candidates is derived from filter assertions by first searching the indexed attributes. Then the non-indexed assertions are tested but only on the search candidate set. Is this correct? If yes, then indexing an attribute which is present in many entries can lead to large search candidate set even though the amount of final search results are small.

Consider the following simple example:

(&(objectClass=posixAccount)(uid=foo))

With lots of user accounts this would lead to two search candidate sets, one very large and one with one entry (assuming uid values are unique). Not indexing objectClass would one result in *one* search candidate. So indexing objectClass might not be very wise.

What do you think about this?

Ciao, Michael.

Show replies by date

Hallvard Breien Furuseth

27 Jan 27 Jan

6:46 a.m.

Michael Ströder writes:

...

We all know the following messages in syslog (loglevel stats):

mdb_equality_candidates: (foo) not indexed

At first glance this seems helpful to find indexing issues.

But IMO

this is somewhat mis-leading. regarding performance tuning and

True. But it's also very valuable for identifying what needs to be indexed, and it does not look easy to delay the message until we know if it might be useful.

We could clarify the doc, which some people even read, and maybe reword the message.

...

if internal searches are conducted (e.g. by set-based ACLs) the

amount of the very same indexing warnings is really annoying and costs performance due to excessive logging.

We can add a "none" indexing level with no effect other than to shut up that warning. I've been thinking of it before, but never got around to coding it.

...

AFAIK a set of search candidates is derived from filter assertions by first searching the indexed attributes.

When that looks useful, yes - like an AND filter. The index narrows down the possible candidate entries: The server takes the intersection of the candidate set returned by baseDN/scope and indexed attrs.

There are some implicit "filters" too: I mentioned baseDN/scope, which works a bit differently. Also filtering for objectClass, see below.

...

Then the non-indexed assertions are tested but only on the search candidate set. Is this correct?

The full filter is tested, since the indexes are inaccurate. Typically it is a hash of the attribute values which is indexed. (Which doesn't imply the index is implemented as a hash table, BTW.)

Yes, for each indexed attribute, an AND filter looks up the set of entry IDs matching the assertion value. Then the server can take the intersection of these.

...

If yes, then indexing an attribute which is present in many entries can lead to large search candidate set even though the amount of final search results are small.

Yes.

...

Consider the following simple example:

(&(objectClass=posixAccount)(uid=foo))

Bad example. The manual says objectClass should be indexed for performance. This is because the server may turn your (FILTER) into

(|(FILTER)(objectClass=alias)(objectClass=referral))

...depending on your search parameters. If there are aliases or referrals in the search scope, the server doesn't know if the entries they refer to match (FILTER). So it has to find and follow every alias in the scope to check, and return all referrals in scope. That's also why it's a bad idea to have lots of aliases.

You are roughly right for other attributes than objectClass, though.

...

With lots of user accounts this would lead to two search candidate sets, one very large and one with one entry (assuming uid values are unique). Not indexing objectClass would one result in *one* search candidate. So indexing objectClass might not be very wise.

More accurately, it leads to just one candidate *set*. Plus the implicit DN/scope candidate set. So there will be fewer candidate sets which the server needs to take an intersection of.

-- Hallvard

Quanah Gibson-Mount

9:37 a.m.

--On Monday, January 27, 2014 1:47 PM +0100 Michael Ströder michael@stroeder.com wrote:

...

HI!

We all know the following messages in syslog (loglevel stats):

mdb_equality_candidates: (foo) not indexed

At first glance this seems helpful to find indexing issues.

Correct, it may or may not be useful. For an admin who knows their ldap server inside and out, and how to properly tune the system with indices, it is not useful at all. For beginners, it can be extremely helpful. For Zimbra builds, I change the logging level for this message from LDAP_DEBUG_ANY to LDAP_DEBUG_TRACE, as we take advantage of index short circuiting (meaning some of the attrs aren't indexed deliberately to increase performance by decreasing the number of evaulated result sets).

--Quanah

Quanah Gibson-Mount Architect - Server Zimbra, Inc. -------------------- Zimbra :: the leader in open source messaging and collaboration

Michael Ströder

2:54 p.m.

Quanah Gibson-Mount wrote:

...

--On Monday, January 27, 2014 1:47 PM +0100 Michael Ströder michael@stroeder.com wrote:

...
HI!

We all know the following messages in syslog (loglevel stats):

mdb_equality_candidates: (foo) not indexed

At first glance this seems helpful to find indexing issues.

Correct, it may or may not be useful.

Then it's also completely meaningless for beginners. And postings on the mailing list already showed that.

...

For Zimbra builds, I change the logging level for this message from LDAP_DEBUG_ANY to LDAP_DEBUG_TRACE,

Do I have to patch the C source or can I use a -D compiler flag?

...

as we take advantage of index short circuiting (meaning some of the attrs aren't indexed deliberately to increase performance by decreasing the number of evaulated result sets).

That's exactly why I'm not indexing a status attribute (let's call it 'organizationalStatus') because it's present in every entry with only very few possible values.

=> the logging level for this message should be LDAP_DEBUG_TRACE in the default source. Especially since the message is written to log dozens of times during evaluating set-based ACLs.

Ciao, Michael.