Hello,
because of this, does it make sense in a directory with > 1,000,000 people to index the sex?
thanks Meike
2013/5/23 Quanah Gibson-Mount quanah@zimbra.com:
--On Thursday, May 23, 2013 4:40 PM +0000 Chris Card ctcard@hotmail.com wrote:
Hi all,
I have an openldap directory with about 7 million DNs, running openldap 2.4.31 with a BDB backend (4.6.21), running on CentOS 6.3.
The structure of the directory is like this, with suffix dc=x,dc=y
dc=x,dc=y account=a,dc=x,dc=y mail=m,account=a,dc=x,dc=y // Users .... licenceId=l,account=a,dc=x,dc=y // Licences, objectclass=licence .... group=g,account=a,dc=x,dc=y // Groups .... // etc.
account=b,dc=x,dc=y ....
Most of the DNs in the directory are users or groups, and the number of licences is small (<10) for each account.
If I do a query with basedn account=a,dc=x,dc=y and filter (objectclass=licence) I see wildly different performance, depending on how many users are under account a. For an account with ~30000 users the query takes 2 seconds at most, but for an account with ~60000 users the query takes 1 minute.
It only appears to be when I filter on objectclass=licence that I see that behaviour. If I filter on a different objectclass which matches a similar number of objects to the objectclass=licence filter, the performance doesn't seem to depend on the number of users.
There is an index on objectclass (of course), but the behaviour I'm seeing seems to indicate that for this query, at some point slapd stops using the index and just scans all the objects under the account.
Any ideas?
Increase the IDL range. This is how I do it:
--- openldap-2.4.35/servers/slapd/back-bdb/idl.h.orig 2011-02-17 16:32:02.598593211 -0800 +++ openldap-2.4.35/servers/slapd/back-bdb/idl.h 2011-02-17 16:32:08.937757993 -0800 @@ -20,7 +20,7 @@ /* IDL sizes - likely should be even bigger
- limiting factors: sizeof(ID), thread stack size
*/ -#define BDB_IDL_LOGN 16 /* DB_SIZE is 2^16, UM_SIZE is 2^17 */ +#define BDB_IDL_LOGN 17 /* DB_SIZE is 2^16, UM_SIZE is 2^17 */ #define BDB_IDL_DB_SIZE (1<<BDB_IDL_LOGN) #define BDB_IDL_UM_SIZE (1<<(BDB_IDL_LOGN+1)) #define BDB_IDL_UM_SIZEOF (BDB_IDL_UM_SIZE * sizeof(ID))
--Quanah
--
Quanah Gibson-Mount Sr. Member of Technical Staff Zimbra, Inc A Division of VMware, Inc.
Zimbra :: the leader in open source messaging and collaboration
Meike Stone wrote:
Hello,
because of this, does it make sense in a directory with > 1,000,000 people to index the sex?
Indexing is all about making rare data easy to find. If you have an attribute that occurs on 99% of your entries, indexing it won't save any search time, and it will needlessly slow down modify time.
Asking about "1,000,000" entries is meaningless on its own. It's not the raw number of entries that matters, it's the percentage of the total directory. If you have 1,000,000,000 entries in your directory, then 1,000,000 is actually quite a small percentage of the data and it might be smart to index it. If you have only 2,000,000 entries total, it may not make enough difference to be worthwhile.
It's not the raw numbers that matter, it's the frequency of occurrences.
thanks Meike
2013/5/23 Quanah Gibson-Mount quanah@zimbra.com:
--On Thursday, May 23, 2013 4:40 PM +0000 Chris Card ctcard@hotmail.com wrote:
Hi all,
I have an openldap directory with about 7 million DNs, running openldap 2.4.31 with a BDB backend (4.6.21), running on CentOS 6.3.
The structure of the directory is like this, with suffix dc=x,dc=y
dc=x,dc=y account=a,dc=x,dc=y mail=m,account=a,dc=x,dc=y // Users .... licenceId=l,account=a,dc=x,dc=y // Licences, objectclass=licence .... group=g,account=a,dc=x,dc=y // Groups .... // etc.
account=b,dc=x,dc=y ....
Most of the DNs in the directory are users or groups, and the number of licences is small (<10) for each account.
If I do a query with basedn account=a,dc=x,dc=y and filter (objectclass=licence) I see wildly different performance, depending on how many users are under account a. For an account with ~30000 users the query takes 2 seconds at most, but for an account with ~60000 users the query takes 1 minute.
It only appears to be when I filter on objectclass=licence that I see that behaviour. If I filter on a different objectclass which matches a similar number of objects to the objectclass=licence filter, the performance doesn't seem to depend on the number of users.
There is an index on objectclass (of course), but the behaviour I'm seeing seems to indicate that for this query, at some point slapd stops using the index and just scans all the objects under the account.
Any ideas?
Increase the IDL range. This is how I do it:
--- openldap-2.4.35/servers/slapd/back-bdb/idl.h.orig 2011-02-17 16:32:02.598593211 -0800 +++ openldap-2.4.35/servers/slapd/back-bdb/idl.h 2011-02-17 16:32:08.937757993 -0800 @@ -20,7 +20,7 @@ /* IDL sizes - likely should be even bigger
- limiting factors: sizeof(ID), thread stack size
*/ -#define BDB_IDL_LOGN 16 /* DB_SIZE is 2^16, UM_SIZE is 2^17 */ +#define BDB_IDL_LOGN 17 /* DB_SIZE is 2^16, UM_SIZE is 2^17 */ #define BDB_IDL_DB_SIZE (1<<BDB_IDL_LOGN) #define BDB_IDL_UM_SIZE (1<<(BDB_IDL_LOGN+1)) #define BDB_IDL_UM_SIZEOF (BDB_IDL_UM_SIZE * sizeof(ID))
--Quanah
--
Quanah Gibson-Mount Sr. Member of Technical Staff Zimbra, Inc A Division of VMware, Inc.
Zimbra :: the leader in open source messaging and collaboration
Indexing is all about making rare data easy to find. If you have an attribute that occurs on 99% of your entries, indexing it won't save any search time, and it will needlessly slow down modify time.
Asking about "1,000,000" entries is meaningless on its own. It's not the raw number of entries that matters, it's the percentage of the total directory. If you have 1,000,000,000 entries in your directory, then 1,000,000 is actually quite a small percentage of the data and it might be smart to index it. If you have only 2,000,000 entries total, it may not make enough difference to be worthwhile.
It's not the raw numbers that matter, it's the frequency of occurrences.
How does the search work in conjunction with the base dn?
Does the search at first go to the index and lookup the attribute/value from the search filter, get all releated id's and then take the id's and get dn's from the entires via id2entry.bdb and then compare this with the basedn?
For example: I have an index objectclass and all my people in the dirctory have the objectClass=inetOrgPerson
In my company works about 140.000 people, divided in seven different departments (under each department are about 20.000 people). So my index for objectlcass=inetorgperson has about 140.000 entires, thats over BDB_IDL_LOGN 17 If I search over the whole DIT (from root), I catch the problem because 140.000>2^17
But if I limit the search with a more specific search base over only one department, does this matter to the speed in this case, or better get around the problem with BDB_IDL_LOGN 17? Can a more specific basedn speed up the search for filter with large results?
I ask this, because it seems to me, that the basedn does not matter in the search ...
Thanks and kindly regards Meike
2013/5/28 Meike Stone meike.stone@googlemail.com:
I ask this, because it seems to me, that the basedn does not matter in the search ...
In my special (real world) case, I have in the basedn 84,000 objects but only one of this is a person with objectclass=inetOrgperson. I have about 420,000 objectclass=inetOrgperson. In the directory are 2,000,000 objects at all.
The search with the specified basedn where only the one inetOrgperson is located needs about 5 minutes ...
Thanks Meike
Meike Stone wrote:
2013/5/28 Meike Stone meike.stone@googlemail.com:
I ask this, because it seems to me, that the basedn does not matter in the search ...
In my special (real world) case, I have in the basedn 84,000 objects but only one of this is a person with objectclass=inetOrgperson. I have about 420,000 objectclass=inetOrgperson. In the directory are 2,000,000 objects at all.
The search with the specified basedn where only the one inetOrgperson is located needs about 5 minutes ...
Looks like a bug in back-bdb, it retrieves the scope index but isn't using it correctly with the filter index. Please submit an ITS for this.
openldap-technical@openldap.org