HI!
I'm trying to retrieve change events from accesslog DB (all with today's RE24). I tried searching with this filter:
(&(reqDN=cn=Test-Mail-Gruppe 1,dc=example,dc=com)(reqStart<=20120413180000Z))
This turned out to be quite slow though. reqDN is indexed and there are only two possible entries. Using a filter reqStart>= even when negated with (!()) is pretty fast.
I really wonder why that is.
Ciao, Michael.
Michael Ströder wrote:
HI!
I'm trying to retrieve change events from accesslog DB (all with today's RE24). I tried searching with this filter:
(&(reqDN=cn=Test-Mail-Gruppe 1,dc=example,dc=com)(reqStart<=20120413180000Z))
This turned out to be quite slow though. reqDN is indexed and there are only two possible entries. Using a filter reqStart>= even when negated with (!()) is pretty fast.
Is reqStart indexed?
I really wonder why that is.
Obviously, for any database that has been around for even a short while, there will be far fewer records with a date newer than [today's date] as opposed to older than then.
Howard Chu wrote:
Michael Ströder wrote:
HI!
I'm trying to retrieve change events from accesslog DB (all with today's RE24). I tried searching with this filter:
(&(reqDN=cn=Test-Mail-Gruppe 1,dc=example,dc=com)(reqStart<=20120413180000Z))
This turned out to be quite slow though. reqDN is indexed and there are only two possible entries. Using a filter reqStart>= even when negated with (!()) is pretty fast.
Is reqStart indexed?
Yes, eq-indexed of course.
I really wonder why that is.
Obviously, for any database that has been around for even a short while, there will be far fewer records with a date newer than [today's date] as opposed to older than then.
But there were only three entries matching (reqDN=cn=Test-Mail-Gruppe 1,dc=example,dc=com) anyway and reqDN is also eq-indexed and finding them with this filter is pretty fast.
Some more examples where reqDN-index is obviously not used but reqStart-index should be used in both cases:
Quite fast although I would have expected an significant slow down because of negation filter: (&(reqDN:dnSubtreeMatch:=ou=Groups,dc=example,dc=com)(reqStart>=20120313072338Z)(!(reqStart>=20120413075657Z)))
Almost identical but very slow compared to the example above: (&(reqDN:dnSubtreeMatch:=ou=Groups,dc=example,dc=com)(reqStart>=20120313072338Z)(reqStart<=20120413075657Z))
I can't explain this based on index configuration. Maybe there's something handled differently with <= compared to >=?
Ciao, Michael.
Michael Ströder wrote:
Howard Chu wrote:
Michael Ströder wrote:
HI!
I'm trying to retrieve change events from accesslog DB (all with today's RE24). I tried searching with this filter:
(&(reqDN=cn=Test-Mail-Gruppe 1,dc=example,dc=com)(reqStart<=20120413180000Z))
This turned out to be quite slow though. reqDN is indexed and there are only two possible entries. Using a filter reqStart>= even when negated with (!()) is pretty fast.
Is reqStart indexed?
Yes, eq-indexed of course.
I really wonder why that is.
Obviously, for any database that has been around for even a short while, there will be far fewer records with a date newer than [today's date] as opposed to older than then.
But there were only three entries matching (reqDN=cn=Test-Mail-Gruppe 1,dc=example,dc=com) anyway and reqDN is also eq-indexed and finding them with this filter is pretty fast.
Some more examples where reqDN-index is obviously not used but reqStart-index should be used in both cases:
Quite fast although I would have expected an significant slow down because of negation filter: (&(reqDN:dnSubtreeMatch:=ou=Groups,dc=example,dc=com)(reqStart>=20120313072338Z)(!(reqStart>=20120413075657Z)))
Range lookups are expensive, even when fully indexed, but negations bypass all index lookups, and are simply replaced with (All IDs). Since this is an AND filter, that result is essentially a no-op and costs nothing.
Almost identical but very slow compared to the example above: (&(reqDN:dnSubtreeMatch:=ou=Groups,dc=example,dc=com)(reqStart>=20120313072338Z)(reqStart<=20120413075657Z))
I can't explain this based on index configuration. Maybe there's something handled differently with<= compared to>=?
They are handled identically, it is simply the difference in number of records that need to be read.
On an indexed <= lookup, the backend reads the Equality index of the given attribute, starting at the beginning, and adding every entryID to the candidate list, until it reaches the end time. On an indexed >= lookup, it reads from the specified timestamp to the end of the index.
Again, obviously when you use [today's date] the >= lookup will be much faster than the <= lookup.
Howard,
Howard Chu wrote:
Michael Ströder wrote:
Howard Chu wrote:
Obviously, for any database that has been around for even a short while, there will be far fewer records with a date newer than [today's date] as opposed to older than then.
But there were only three entries matching (reqDN=cn=Test-Mail-Gruppe 1,dc=example,dc=com) anyway and reqDN is also eq-indexed and finding them with this filter is pretty fast.
thanks for your explanations but I still do not fully understand it. Maybe I'm overviewing something obvious.
Some more examples where reqDN-index is obviously not used but reqStart-index should be used in both cases:
Quite fast although I would have expected an significant slow down because of negation filter: (&(reqDN:dnSubtreeMatch:=ou=Groups,dc=example,dc=com)(reqStart>=20120313072338Z)(!(reqStart>=20120413075657Z)))
Range lookups are expensive, even when fully indexed, but negations bypass all index lookups, and are simply replaced with (All IDs). Since this is an AND filter, that result is essentially a no-op and costs nothing.
Do I understand you right that in this case reqStart-index is not used at all for processing the part (!(reqStart>=20120413075657Z)). That meets what I expected from negations.
But reqStart-index is used for filtering based on (reqStart>=20120313072338Z)?
Almost identical but very slow compared to the example above: (&(reqDN:dnSubtreeMatch:=ou=Groups,dc=example,dc=com)(reqStart>=20120313072338Z)(reqStart<=20120413075657Z))
I can't explain this based on index configuration. Maybe there's something handled differently with<= compared to>=?
They are handled identically, it is simply the difference in number of records that need to be read.
On an indexed <= lookup, the backend reads the Equality index of the given attribute, starting at the beginning, and adding every entryID to the candidate list, until it reaches the end time. On an indexed >= lookup, it reads from the specified timestamp to the end of the index.
But why does (reqStart>=20120313072338Z) not already limit the number of search candidates to be filtered with (reqStart<=20120413075657Z)? Is filter order significant?
But I really wonder why if I use exact reqDN search with reqDN being eq-indexed is slow with additional <= filter part.
Fast since reqDN eq-indexed and only one(!) entry returned:
(&(reqDN=cn=Info-Mail-Testgruppe1,ou=Groups,dc=example))
Fast since reqDN and reqStart eq-indexed: (&(reqDN=cn=Info-Mail-Testgruppe1,ou=Groups,dc=example)(reqStart>=20120313072338Z))
Slow (30 sec) although reqDN eq-indexed which should already limit the number of search candidates to one(!):
(&(reqDN=cn=Info-Mail-Testgruppe1,ou=Groups,dc=example)(reqStart>=20120313072338Z)(reqStart<=20120415075657Z))
Aaah, now I see. If I turn off eq-index for reqStart this case is also fast because first the reqDN-eq-index is used and after that unindexed filtering is done on reqStart range.
Hmm, can I influence the order of index usage by order in the filter or slapd index configuration?
Wouldn't it make sense to treat <= filter parts as unindexed even if there's an eq-index defined for the attribute type to postpone <= filtering to the set of search candidates filtered by indexes before?
Ciao, Michael.
openldap-technical@openldap.org