Appropriate index for <= >= filter on generalizedTime (Re: reqStart<= slow)

List overview All Threads
Download

newer

older

LDAP search filter on...

mdb_idl_fetch_key: cursor failed:...

Roman Rybalko

24 Sep 2012 24 Sep '12

10:57 a.m.

Hi,

I'm trying to use >= <= filter on eq-indexed generalizedTime attribute. Seems eq-index does not work for <= filter. olcAttributeTypes: ( 2.999.777.1.1.1.5 NAME 'logTime' DESC 'Time' EQUALITY generalizedTimeMatch ORDERING generalizedTimeOrderingMatch SYNTAX 1.3.6.1.4.1.1466.115.121.1.24 ) olcDbIndex: logTime eq (&(logTime>=201209201440+0400)(logTime<=201209201450+0400)) - 57sec (|(logTime=20120920144001+0400)(logTime=20120920144008+0400)) - 0.03sec ~ 1000000 records in db, target = 100000000 records What should I do to make >= filters work efficiently? I may consider using another data type that work with <=/=> filters for sure. Strings? Integers? Anything that can mimic TIME semantics. Any advice?

14.04.2012 19:44, Michael Ströder пишет:

...

can I influence the order of index usage by order in the filter or slapd index configuration?

The same question from me. Is it possible to select indexes using some sort of filter syntax?

-- WBR, Roman Rybalko

Attachments:

attachment.htm (text/html — 1.7 KB)

Show replies by date

Howard Chu

24 Sep 24 Sep

11:23 a.m.

Roman Rybalko wrote:

...

Hi,

I'm trying to use >= <= filter on eq-indexed generalizedTime attribute. Seems eq-index does not work for <= filter.

Actually the eq index works fine.

...

olcAttributeTypes: ( 2.999.777.1.1.1.5 NAME 'logTime' DESC 'Time' EQUALITY generalizedTimeMatch ORDERING generalizedTimeOrderingMatch SYNTAX 1.3.6.1.4.1.1466.115.121.1.24 ) olcDbIndex: logTime eq (&(logTime>=201209201440+0400)(logTime<=201209201450+0400)) - 57sec (|(logTime=20120920144001+0400)(logTime=20120920144008+0400)) - 0.03sec ~ 1000000 records in db, target = 100000000 records What should I do to make >= filters work efficiently?

Use a correct filter. Your clauses above use invalid syntax.

-- -- Howard Chu CTO, Symas Corp. http://www.symas.com Director, Highland Sun http://highlandsun.com/hyc/ Chief Architect, OpenLDAP http://www.openldap.org/project/

Roman Rybalko

11:32 a.m.

24.09.2012 22:23, Howard Chu пишет:

...

...
(&(logTime>=201209201440+0400)(logTime<=201209201450+0400)) - 57sec (|(logTime=20120920144001+0400)(logTime=20120920144008+0400)) - 0.03sec

Use a correct filter. Your clauses above use invalid syntax.

Point me please, where exactly my syntax is invalid?

-- WBR, Roman Rybalko

Howard Chu

12:04 p.m.

Roman Rybalko wrote:

...

24.09.2012 22:23, Howard Chu пишет:

...
...
(&) - 57sec (|(logTime=20120920144001+0400)(logTime=20120920144008+0400)) - 0.03sec

Use a correct filter. Your clauses above use invalid syntax.

Point me please, where exactly my syntax is invalid?

You must be blind. You wrote:

(&(logTime>=201209201440+0400)(logTime<=201209201450+0400)) - 57sec

Check the proper format of a GeneralizedTime value. You have omitted the seconds field, so you're effectively looking for every entry greater than the year 20, December 09. The index lookup for this will most likely hit every entry in your DB, which is why it takes 57 seconds.

-- -- Howard Chu CTO, Symas Corp. http://www.symas.com Director, Highland Sun http://highlandsun.com/hyc/ Chief Architect, OpenLDAP http://www.openldap.org/project/

Roman Rybalko

12:50 p.m.

New subject: Appropriate index for <= >= filter on generalizedTime

24.09.2012 23:04, Howard Chu пишет:

...

Roman Rybalko wrote:

...
24.09.2012 22:23, Howard Chu пишет:

...
...
(&(logTime>=201209201440+0400)(logTime<=201209201450+0400)) - 57sec (|(logTime=20120920144001+0400)(logTime=20120920144008+0400)) - 0.03sec

Use a correct filter. Your clauses above use invalid syntax.

Point me please, where exactly my syntax is invalid?

You must be blind. You wrote:

(&(logTime>=201209201440+0400)(logTime<=201209201450+0400)) - 57sec

Check the proper format of a GeneralizedTime value. You have omitted the seconds field, so you're effectively looking for every entry greater than the year 20, December 09. The index lookup for this will most likely hit every entry in your DB, which is why it takes 57 seconds.

Many thanks for suggestion. Apologies for my extra curiosity.

I tried full GeneralizedTime format (with seconds, with fractions) but the search works also slow. Even the search (|(logTime=20120920144001+0400)(logTime=20120920144008+0400)) that works less than second and returns 2 entries, when reformatted as (&(logTime>=20120920144001+0400)(logTime<=20120920144008+0400)) works more than 50 seconds.

According to RFC4517 ( http://tools.ietf.org/html/rfc4517#section-3.3.13), GeneraalizedTime has the syntax:

GeneralizedTime = century year month day hour [ minute [ second / leap-second ] ] [ fraction ] g-time-zone

which means that minutes and seconds may be omitted. Probably that's not implemented... no problem.

openldap version 2.4.23

How may I optimize (&(>=)(<=)) searches?

-- WBR, Roman Rybalko

Howard Chu

2:31 p.m.

New subject: Appropriate index for <= >= filter on generalizedTime

Roman Rybalko wrote:

...

I tried full GeneralizedTime format (with seconds, with fractions) but the search works also slow. Even the search (|(logTime=20120920144001+0400)(logTime=20120920144008+0400)) that works less than second and returns 2 entries, when reformatted as (&(logTime>=20120920144001+0400)(logTime<=20120920144008+0400)) works more than 50 seconds.

According to RFC4517 ( http://tools.ietf.org/html/rfc4517#section-3.3.13), GeneraalizedTime has the syntax:

GeneralizedTime = century year month day hour [ minute [ second / leap-second ] ] [ fraction ] g-time-zone

which means that minutes and seconds may be omitted. Probably that's not implemented... no problem.

My mistake. Our generalizedTime validator handles those optional components.

...

openldap version 2.4.23

How may I optimize (&(>=)(<=)) searches?

Aside from the equality index, there's not much else to be done. I'd suggest you pull the code in RE24 and see how back-mdb performs with your data.

-- -- Howard Chu CTO, Symas Corp. http://www.symas.com Director, Highland Sun http://highlandsun.com/hyc/ Chief Architect, OpenLDAP http://www.openldap.org/project/

Roman Rybalko

27 Sep 27 Sep

2:42 a.m.

New subject: Appropriate index for <= >= filter on generalizedTime

25.09.2012 01:31, Howard Chu пишет:

...

Aside from the equality index, there's not much else to be done. I'd suggest you pull the code in RE24 and see how back-mdb performs with your data.

Pulled 2a68553ec103eec28237c2608b3fea149a492b76 Tried mdb - works completely the same way. (&(logTime>=20120816200001+0400)(logTime<=20120816200501+0400)) - 12sec (|(logTime=20120816200001+0400)(logTime=20120816200501+0400)) - 0.01sec ~ 100000 objects in db

Should I rely only on (|(=)(=)(=)(=)...) filters ?

-- WBR, Roman Rybalko

Howard Chu

3:21 a.m.

New subject: Appropriate index for <= >= filter on generalizedTime

Roman Rybalko wrote:

...

25.09.2012 01:31, Howard Chu пишет:

...
Aside from the equality index, there's not much else to be done. I'd suggest you pull the code in RE24 and see how back-mdb performs with your data.

Pulled 2a68553ec103eec28237c2608b3fea149a492b76 Tried mdb - works completely the same way.

Looks to me like it's 4-5x faster.....

...

(&(logTime>=20120816200001+0400)(logTime<=20120816200501+0400)) - 12sec (|(logTime=20120816200001+0400)(logTime=20120816200501+0400)) - 0.01sec ~ 100000 objects in db

Should I rely only on (|(=)(=)(=)(=)...) filters ?

If that suits you, sure. Personally I would submit an enhancement request to the ITS about getting this indexing code restructured. Ideally with a patch attached, fixing it as desired.

-- -- Howard Chu CTO, Symas Corp. http://www.symas.com Director, Highland Sun http://highlandsun.com/hyc/ Chief Architect, OpenLDAP http://www.openldap.org/project/

Roman Rybalko

6:05 a.m.

New subject: MDB issues (Re: Appropriate index for <= >= filter on generalizedTime)

27.09.2012 14:21, Howard Chu пишет:

...

Looks to me like it's 4-5x faster.....

...
(&(logTime>=20120816200001+0400)(logTime<=20120816200501+0400)) - 12sec (|(logTime=20120816200001+0400)(logTime=20120816200501+0400)) - 0.01sec ~ 100000 objects in db

Actually Yes, I tried the same search on 1000000-object-sized db and MDB is really 5x faster, timing 12-13sec. Also it's very convenient to watch roughly DB size by shared process memory usage.

Except it's still buggy. I trapped some "mdb_add: txn_commit failed : MDB_PAGE_FULL: Internal error - page has no more space (-30786)" error. How may I report? Via ITS (gimme url plz) or openldap-devel list? Also I have full debug build so I may look into stacktrace.

-- WBR, Roman Rybalko

Howard Chu

6:26 a.m.

New subject: MDB issues (Re: Appropriate index for <= >= filter on generalizedTime)

Roman Rybalko wrote:

...

27.09.2012 14:21, Howard Chu пишет:

...
Looks to me like it's 4-5x faster.....

...
(&(logTime>=20120816200001+0400)(logTime<=20120816200501+0400)) - 12sec (|(logTime=20120816200001+0400)(logTime=20120816200501+0400)) - 0.01sec ~ 100000 objects in db

Actually Yes, I tried the same search on 1000000-object-sized db and MDB is really 5x faster, timing 12-13sec. Also it's very convenient to watch roughly DB size by shared process memory usage.

Except it's still buggy. I trapped some "mdb_add: txn_commit failed : MDB_PAGE_FULL: Internal error - page has no more space (-30786)" error.

Did you already have commit 0c4c6fe72a57f812e4486cd017298f730df19c23 ?

...

How may I report? Via ITS (gimme url plz)

www.openldap.org

...

or openldap-devel list? Also I have full debug build so I may look into stacktrace.

-- -- Howard Chu CTO, Symas Corp. http://www.symas.com Director, Highland Sun http://highlandsun.com/hyc/ Chief Architect, OpenLDAP http://www.openldap.org/project/

Roman Rybalko

1:29 p.m.

New subject: MDB issues - Solved

27.09.2012 17:26, Howard Chu пишет:

...

Roman Rybalko wrote:

...
I trapped some "mdb_add: txn_commit failed : MDB_PAGE_FULL: Internal error - page has no more space (-30786)" error.

Did you already have commit 0c4c6fe72a57f812e4486cd017298f730df19c23 ?

No, I didn't. But now I have commit a1c2dc6 (next after 0c4c6fe) and it works fine. Thank you!

-- WBR, Roman Rybalko

Roman Rybalko

30 Sep 30 Sep

7:05 a.m.

New subject: Appropriate index for <= >= filter on generalizedTime

25.09.2012 01:31, Howard Chu пишет:

...

Roman Rybalko wrote:

...
How may I optimize (&(>=)(<=)) searches?

Aside from the equality index, there's not much else to be done. I'd suggest you pull the code in RE24 and see how back-mdb performs with your data.

Actually MDB works with indexed generalizedTime attribute no faster than HDB, timings are non-constant (mdb: 22sec-78sec vs hdb: 52sec) probably due to intensive mmap-io (dataset size: 1M objects).

But eq-index on Integers olcAttributeTypes: ( 2.999.777.1.1.1.3 NAME 'logOrderingInteger' DESC 'Integer with ordering' EQUALITY integerMatch ORDERING integerOrderingMatch SYNTAX 1.3.6.1.4.1.1466.115.121.1.27 ) with (&(<=)(>=)) search filters works on MDB really fast, no slower than any other eq/sub-index on strings (0.02sec-0.6sec on 1000000 objects dataset). Didn't tried it on HDB though, may be it works the same, may be not, but MDB seems has big advantage in memory consuming (MDB is less memory consuming vs HDB).

So I decided to search my time values encoded into eq-indexed integers with MDB backend. That fits my requirements best.

-- WBR, Roman Rybalko

4657

Age (days ago)

4663

Last active (days ago)

openldap-technical@openldap.org

11 comments

2 participants

tags (0)

participants (2)

Howard Chu
Roman Rybalko