Hello,
I'm sorry, but I want to ask again for clarifying.
First question:
- An index slot is loosing precision if the search result for an (indexed) attribute is larger than 2^16. Then the search time is going to increase a lot. - I can change this via BDB_IDL_LOGN. - But if I have a directory, that holds 200.000 employees with '(ObjectClass=Employees)', the result is larger than 2^16 and it is slow. - lets say, the employees are distributed over 4 continents and the DIT is structured geographical eg.:
o=myOrg, c=us (100,000 employees) o=myOrg, c=gb ( 30,000 employees) o=myOrg, c=de ( 25,000 employees) o=myOrg, c=br ( 45,000 employees)
Can i prevent it this problem with index slot size, if I change the search base to "o=myOrg, c=gb", because there are only 30,000 employees. This takes me to the second question:
"How is a search filter evaluated?"
Lets say, I combine three filter via "and" like '(&(objectlass=Employees)(age=30)(sex=m))' and the all attributes are indexed. Each filter results: (objectlass=Employees) => 200,000 entries (age=30) => 10,000 entries (sex=m) => 3,000 entries
Does the order matter regarding speed, is it better to form the filter like this? '(&(sex=m)(age=30)(objectlass=Employees))'
Thanks Meike
Meike Stone wrote:
Hello,
I'm sorry, but I want to ask again for clarifying.
First question:
- An index slot is loosing precision if the search result for an
(indexed) attribute is larger than 2^16. Then the search time is going to increase a lot.
- I can change this via BDB_IDL_LOGN.
- But if I have a directory, that holds 200.000 employees with
'(ObjectClass=Employees)', the result is larger than 2^16 and it is slow.
- lets say, the employees are distributed over 4 continents and the
DIT is structured geographical eg.:
o=myOrg, c=us (100,000 employees) o=myOrg, c=gb ( 30,000 employees) o=myOrg, c=de ( 25,000 employees) o=myOrg, c=br ( 45,000 employees)
Can i prevent it this problem with index slot size, if I change the search base to "o=myOrg, c=gb", because there are only 30,000 employees.
Try it and see.
If you're already playing with low-level definitions in the source code, you have no need for us to answer these questions. Or, if you need us to answer these questions, you have no business playing with low-level definitions in the source code.
This takes me to the second question:
"How is a search filter evaluated?"
Lets say, I combine three filter via "and" like '(&(objectlass=Employees)(age=30)(sex=m))' and the all attributes are indexed. Each filter results: (objectlass=Employees) => 200,000 entries (age=30) => 10,000 entries (sex=m) => 3,000 entries
Does the order matter regarding speed, is it better to form the filter like this? '(&(sex=m)(age=30)(objectlass=Employees))'
Try it and see.
If you have advance knowledge of the characteristics of your data, perhaps you can optimize the filter order. In most cases, your applications will not have such knowledge, or it will be irrelevant. For example, if you have a filter term that matches zero entries, it is beneficial to evaluate that first in an AND clause. But it would make no difference in an OR clause.
Hello Howard,
thanks for fast answer!
- An index slot is loosing precision if the search result for an
(indexed) attribute is larger than 2^16. Then the search time is going to increase a lot.
- I can change this via BDB_IDL_LOGN.
- But if I have a directory, that holds 200.000 employees with
'(ObjectClass=Employees)', the result is larger than 2^16 and it is slow.
- lets say, the employees are distributed over 4 continents and the
DIT is structured geographical eg.:
o=myOrg, c=us (100,000 employees) o=myOrg, c=gb ( 30,000 employees) o=myOrg, c=de ( 25,000 employees) o=myOrg, c=br ( 45,000 employees)
Can i prevent it this problem with index slot size, if I change the search base to "o=myOrg, c=gb", because there are only 30,000 employees.
Try it and see.
If you're already playing with low-level definitions in the source code, you have no need for us to answer these questions. Or, if you need us to answer these questions, you have no business playing with low-level definitions in the source code.
I'm the Linux admin of the systems and I'm responsible for all the services like ldap, mysql, ... In our company we have a few programer, how cares for the data for our ldap-server. Suddenly, the slapd was slow and I found the solution here: http://www.openldap.org/lists/openldap-technical/201101/msg00102.html I and increased BDB_IDL_LOGN.
But now, month later - answers are slow again. So I wrote a small perl script, who searches for the "longtimers" in the log file (with loglevel 256). And I see a lot off searches takes a long time (until 500s) and I want to understand why.
I'll try to understand, how all the internal stuff works, but it is hard for an non c programmer like me. A book about all of this would be very appreciated. I understand, how the ldap operation works, but not how they are implemented and how must I care, that hey work fast. Our programer says only: "That's my question (ldap search), it is "well formed", why do I must wait so long for answer, didn't we build a index for all needed attributes?"
This takes me to the second question:
"How is a search filter evaluated?"
Lets say, I combine three filter via "and" like '(&(objectlass=Employees)(age=30)(sex=m))' and the all attributes are indexed. Each filter results: (objectlass=Employees) => 200,000 entries (age=30) => 10,000 entries (sex=m) => 3,000 entries
Does the order matter regarding speed, is it better to form the filter like this? '(&(sex=m)(age=30)(objectlass=Employees))'
Try it and see.
If you have advance knowledge of the characteristics of your data, perhaps you can optimize the filter order. In most cases, your applications will not have such knowledge, or it will be irrelevant. For example, if you have a filter term that matches zero entries, it is beneficial to evaluate that first in an AND clause. But it would make no difference in an OR clause.
Ok, thanks a lot
kindly regards, Meike
openldap-technical@openldap.org