How Indexes work?

List overview All Threads
Download

newer

older

ber_flatten() of incomplete ber

Kerberos/GSSAPI issues

Steeg Carson

16 Dec 2010 16 Dec '10

4:04 p.m.

Hello,

I try to understand, how the LDAP-Indexes work.

If I configure a Index for a Attribute like:

index myAttribute eq

the index file myAttribute.bdb is build in the data directory.

When I search then

ldapsearch -x -h localhost -D".." -b"<baseDN>" "(myAttribute=<searched key>)"

how will the LDAP-Server process this request?

Is there anywhere a good documentation?

My assumption is: * At first, a the Index is looked up. The result are only the matched IDs. * The LDAP-Server now can quick give back all entries form id2enty.bdb

If I use Indexes, are all other entries are examined too after give back the result from indexes?

I have a database, and my search is like shown above. The search takes long. The cache is configured, the size is enough (approx. dn2id.bdb + id2entry.bdb). But what I see, is that the write IO from LDAP is enormously (seen with iotop). During the whole search, the write IO is higher than the read IO. Why?

Thanks for help.

Steeg

Show replies by date

Bjørn Ruberg

17 Dec 17 Dec

3:19 a.m.

Steeg Carson: [...]

...

I have a database, and my search is like shown above. The search takes long.

Did you run slapindex after adding the index? Is the index file owned by the proper user account?

...

The cache is configured, the size is enough (approx. dn2id.bdb + id2entry.bdb).

You should configure the cache to be large enough to hold all indexes, not only dn2id and id2entry. See e.g.

http://www.linuxtopia.org/online_books//network_administration_guides/ldap_a...

...

But what I see, is that the write IO from LDAP is enormously (seen with iotop). During the whole search, the write IO is higher than the read IO. Why?

What is slapd's current loglevel?

Also please note that it's not necessary to post the same message to the mailing list several times.

-- Bjørn

Steeg Carson

2 p.m.

2010/12/17 Bjørn Ruberg bjorn@ruberg.no:

...

Steeg Carson: [...]

...
I have a database, and my search is like shown above. The search takes long.

Did you run slapindex after adding the index? Is the index file owned by the proper user account?

...
The cache is configured, the size is enough (approx. dn2id.bdb + id2entry.bdb).

You should configure the cache to be large enough to hold all indexes, not only dn2id and id2entry. See e.g.

http://www.linuxtopia.org/online_books//network_administration_guides/ldap_a...

I know this guides an I also read FAQs an Adminguide. dn2id plus id2entry is a rule of thumb. If I calulate this like you told, its approx the same size ...

...

...
But what I see, is that the write IO from LDAP is enormously (seen with iotop). During the whole search, the write IO is higher than the read IO. Why?

What is slapd's current loglevel?

loglevel is 0 I know, I should better use 256, but for this reason, I did switch off logging :-( for testing.

...

Also please note that it's not necessary to post the same message to the mailing list several times.

I sent my first posting 2 days ago, but it did not reach the list .... My second question is missing in the list till now. So I sent them a second time, because I thought they got lost.

But what about the first part of my question. in this Posting. How will be a ldapsearch processed? Does the slapd, search the whole database despite of indexes?

Thanks you very much

Steeg

Bjørn Ruberg

2:23 p.m.

Steeg Carson:

...

2010/12/17 Bjørn Rubergbjorn@ruberg.no:

...
Steeg Carson: [...]

...
I have a database, and my search is like shown above. The search takes long.

Did you run slapindex after adding the index? Is the index file owned by the proper user account?

You didn't answer the above question...

[...]

...

...
...
But what I see, is that the write IO from LDAP is enormously (seen with iotop). During the whole search, the write IO is higher than the read IO. Why?

What is slapd's current loglevel?

loglevel is 0 I know, I should better use 256, but for this reason, I did switch off logging :-( for testing.

I asked because you said there's much *write* activity. If there's no logging, something else must be writing and you should find out what it is. This is probably the reason why the search is slow.

[...]

...

But what about the first part of my question. in this Posting. How will be a ldapsearch processed? Does the slapd, search the whole database despite of indexes?

I'm no authority on this, but generally the main purpose of using indexes is -not- having to do a full scan. This will of course require that the index has been properly built (se above).

If your original statement is still correct - that is, you've built an "eq" index (equality, exact match) and you search for the exact value - the index should have made a difference.

However, if you've built an equality index and then search for a substring, the index will not speed it up.

As a side note, you should be aware that while most attributes can be indexed with "eq", some attributes won't allow substring indexing.

Hope this helps.

-- Bjørn

Steeg Carson

30 Dec 30 Dec

10:53 a.m.

Hello,

I tried a little bit more time to investigate the problem.

First, I installed a 64bit test machine, with 16GByte RAM and 2 CPUs under VMware ESX with own SAS-Storage (RAID10) for only this Machine. I configured slapd.conf as following:

####################################################################### include /etc/openldap/schema/core.schema include /etc/openldap/schema/cosine.schema include /etc/openldap/schema/inetorgperson.schema include /etc/openldap/schema/rfc2307bis.schema include /etc/openldap/schema/own.schema

pidfile /var/run/slapd/slapd.pid argsfile /var/run/slapd/slapd.args

modulepath /usr/lib/ldap moduleload back_hdb

sizelimit -1 timelimit 300 disallow bind_anon

gentlehup on tool-threads 2

# hdb database definitions

database hdb suffix "ou=root" rootdn "uid=admin,ou=root" checkpoint 4096 15 # loglevel only for test, not during time measuring loglevel 33 rootpw password directory /var/lib/ldap_hdb logfile /var/log/openldap.log cachesize 1000000 dncachesize 1000000 idlcachesize 3000000 dbnosync

index objectClass,entryUUID,entryCSN eq index subEngine eq index cn eq,sub #######################################################################

The backend for the database uses hdb.

In DB_CONF i set 2 GB BDB page cache (set_cachesize 2 0 1)

The entire directory holds 470812 entires.

(=> ldapsearch -x -h localhost -wpassword -D"uid=admin,ou=root" -b"ou=root" "(objectClass=*)" dn | grep "^dn:" | wc -l)

Task:

I search Objects with a special objectClass (subEngine) only in a dedicated oontainer (set via Base DN).

The objectClass "subEngine" exists 104384 times in the entire directory: (=> ldapsearch -x -h localhost -D"uid=admin,ou=root" -b"cn=ou=root>" "(ObjectClass=subEngine)" dn | grep "^dn:" | wc -l)

But the objectClass "subEngine" exist only one time in the dedicated Container.

When I do the search: ldapsearch -x -h localhost -wpassword -D"uid=admin,ou=root" -b"cn=ownPath,ou=root" "(ObjectClass=subEngine)"

in the logfile I can see:

=> bdb_equality_candidates (objectClass) => key_read <= bdb_index_read 470601 candidates <= bdb_equality_candidates: id=-1, first=228, last=470828 <= bdb_filter_candidates: id=-1 first=228 last=470828 <= bdb_list_candidates: id=-1 first=228 last=470828 <= bdb_filter_candidates: id=-1 first=228 last=470828 <= bdb_list_candidates: id=-1 first=40595 last=470828 <= bdb_filter_candidates: id=-1 first=40595 last=470828 bdb_search_candidates: id=-1 first=40595 last=470828

What does does these messages mean?

I can't see, how they are related with the directory.

If I search then in the logfile, I see 430233 messages like: "hdb_search: <candidate> <message: does not match filter | scope not okay>"

So the 430233 comes from 470828-40595=430233. Why so much searches?

Should the index for the objectClass=subEngine not hold only 104384 entires?

What are this for values, and how is the search done?

I guess, the first is the index lookup. But the index holds only the IDs and does nothing know about the DN from the entry. So in the next step, all from Index ID's will used to query the id2Entry.bdb and check the DN?

This search takes about 40seconds (with logging turned off!) for the first time. During this time, I can see a heavy write (about 25M/s) load from the slapd (seen with iotop)

After the cache is filled, the lookup takes about 2 seconds (with logging turned off!)...

The only difference in the logs between the first an the second search is, that in the log for the first search for each hdb_search a

entry_decode: "" <= entry_decode()

is seen.

But in the log from the second search the hdb_search is done also 430233 times!

Is this correct?

Thanks in advance

Steeg Carson

2010/12/17 Bjørn Ruberg bjorn@ruberg.no:

...

Steeg Carson:

...
2010/12/17 Bjørn Rubergbjorn@ruberg.no:

...
Steeg Carson: [...]

...
I have a database, and my search is like shown above. The search takes long.

Did you run slapindex after adding the index? Is the index file owned by the proper user account?

You didn't answer the above question...

[...]

...
...
...
But what I see, is that the write IO from LDAP is enormously (seen with iotop). During the whole search, the write IO is higher than the read IO. Why?

What is slapd's current loglevel?

loglevel is 0 I know, I should better use 256, but for this reason, I did switch off logging :-( for testing.

I asked because you said there's much *write* activity. If there's no logging, something else must be writing and you should find out what it is. This is probably the reason why the search is slow.

[...]

...
But what about the first part of my question. in this Posting. How will be a ldapsearch processed? Does the slapd, search the whole database despite of indexes?

I'm no authority on this, but generally the main purpose of using indexes is -not- having to do a full scan. This will of course require that the index has been properly built (se above).

If your original statement is still correct - that is, you've built an "eq" index (equality, exact match) and you search for the exact value - the index should have made a difference.

However, if you've built an equality index and then search for a substring, the index will not speed it up.

As a side note, you should be aware that while most attributes can be indexed with "eq", some attributes won't allow substring indexing.

Hope this helps.

-- Bjørn

Dieter Klünter

11:15 a.m.

On Thu, Dec 30, 2010 at 07:53:52PM +0100, Steeg Carson wrote:

...

Hello,

I tried a little bit more time to investigate the problem.

First, I installed a 64bit test machine, with 16GByte RAM and 2 CPUs under VMware ESX with own SAS-Storage (RAID10) for only this Machine. I configured slapd.conf as following:

[...]

...

In DB_CONF i set 2 GB BDB page cache (set_cachesize 2 0 1)

...

The entire directory holds 470812 entires. ldapsearch -x -h localhost -wpassword -D"uid=admin,ou=root" -b"cn=ownPath,ou=root" "(ObjectClass=subEngine)"

The default scope is subtree, reduce scope to onelevel

[...]

-Dieter

-- Dieter Klünter | Systemberatung http://dkluenter.de GPG Key ID:DA147B05 53°37'09,95"N 10°08'02,42"E

Steeg Carson

2:54 p.m.

2010/12/30 Dieter Klünter dieter@dkluenter.de:

...

On Thu, Dec 30, 2010 at 07:53:52PM +0100, Steeg Carson wrote:

...
Hello,

I tried a little bit more time to investigate the problem.

First, I installed a 64bit test machine, with 16GByte RAM and 2 CPUs under VMware ESX with own SAS-Storage (RAID10) for only this Machine. I configured slapd.conf as following:

[...]

...
In DB_CONF i set 2 GB BDB page cache (set_cachesize 2 0 1)

...
The entire directory holds 470812 entires. ldapsearch -x -h localhost -wpassword -D"uid=admin,ou=root" -b"cn=ownPath,ou=root" "(ObjectClass=subEngine)"

The default scope is subtree, reduce scope to onelevel

No, it is necessary to search the whole subtree not only one level... This was not the question.

Thanks Steeg

5429

Age (days ago)

5442

Last active (days ago)

openldap-technical@openldap.org

6 comments

3 participants

tags (0)

participants (3)

Bjørn Ruberg
Dieter Klünter
Steeg Carson