I continue to have trouble with getting a freshly started server to be responsive. One problem in particular is one that I thought had been resolved some time ago but is apparently biting me right now...
With the hdb backend (at least in OL 2.3.34 and OL 2.3.35) if you perform a search with a search base deeper than the root suffix, the search takes a very long time to complete if the cache hasn't been established. In my case the difference is less than a second versus several hours. I'm not sure yet which bit of cache needs to be primed. I can switch back and forth searching with the same filter in the root and then a child search base with the same results.
Is this a bug recursion or something that I just hadn't been noticing?
What would be the best search to perform to prepare whatever cache is getting hit to make searches outside of the root DN faster?
I continue to have trouble with getting a freshly started server to be responsive. One problem in particular is one that I thought had been resolved some time ago but is apparently biting me right now... With the hdb backend (at least in OL 2.3.34 and OL 2.3.35) if you perform a search with a search base deeper than the root suffix, the search takes a very long time to complete if the cache hasn't been established. In my case the difference is less than a second versus several hours. I'm not sure yet which bit of cache needs to be primed. I can switch back and forth searching with the same filter in the root and then a child search base with the same results.
Have you set DB_CONFIG to reflect reasonable settings given the size of your database?
Is this a bug recursion or something that I just hadn't been noticing? What would be the best search to perform to prepare whatever cache is getting hit to make searches outside of the root DN faster?
Eric Irrgang wrote:
I continue to have trouble with getting a freshly started server to be responsive. One problem in particular is one that I thought had been resolved some time ago but is apparently biting me right now...
With the hdb backend (at least in OL 2.3.34 and OL 2.3.35) if you perform a search with a search base deeper than the root suffix, the search takes a very long time to complete if the cache hasn't been established. In my case the difference is less than a second versus several hours. I'm not sure yet which bit of cache needs to be primed. I can switch back and forth searching with the same filter in the root and then a child search base with the same results.
If it takes several hours, then most likely your BDB cache is too small.
As for which cache, it's either the DN cache (aka EntryInfo in the code) or the IDL cache. (Currently the DN cache size is not configurable, will probably add a keyword for that in 2.4.)
Is this a bug recursion or something that I just hadn't been noticing?
What would be the best search to perform to prepare whatever cache is getting hit to make searches outside of the root DN faster?
Priming the caches will only help if you actually have sufficient RAM available. If the DB is too large, then there's not much you can do about it. If you have sufficient RAM, then doing a subtree search from the root, on an unindexed attribute, looking for a value that doesn't exist, will hit every entry in the DB and fully prime the DN cache (and the DN-related info in the IDL cache). It will cycle the full contents of the dn2id and id2entry DBs through the BDB cache as well.
Well, once an (objectclass=*) search finishes the ou=people,dc=basedn searches run fast again. Unfortunately it takes over half an hour to run the first time and I have to make sure that during that time no one has access to cause extra threads to start searching.
On Fri, 25 May 2007, Eric Irrgang wrote:
I continue to have trouble with getting a freshly started server to be responsive. One problem in particular is one that I thought had been resolved some time ago but is apparently biting me right now...
With the hdb backend (at least in OL 2.3.34 and OL 2.3.35) if you perform a search with a search base deeper than the root suffix, the search takes a very long time to complete if the cache hasn't been established. In my case the difference is less than a second versus several hours. I'm not sure yet which bit of cache needs to be primed. I can switch back and forth searching with the same filter in the root and then a child search base with the same results.
Is this a bug recursion or something that I just hadn't been noticing?
What would be the best search to perform to prepare whatever cache is getting hit to make searches outside of the root DN faster?
Searching for objectclass=* and only asking for the entryDN attribute is almost an order of magnitude faster than searching for everything. I like the idea of searching for a non-existent value of an unindexed attribute. It is 40% faster than the objectclass=* search (down to 16 minutes) but does not fix the searches made outside of the directory's root search base. At least not quite. The test search I tried took four minutes (about an order of magnitude better than no priming search) and the following test searches were under a second. Any thoughts on what extra magic might be coming out of the objectclass=* into the caches?
My DB cache is just barely sufficient but it seems to be large enough. Examining the db_stat output, iostat, and vmstat, all indications are that by the time I've got right about the time my DB cache approaches full my DB cache hits rapidly approach 100% and I'm neither paging in or out. I can't really go any bigger.
cachesize 100000 idlcachesize 300000 dbconfig set_cachesize 10 0 1 shm_key 7 dbconfig set_shm_key 7
A freshly restarted server quickly ends up with 11 or 12 Gigs resident with a VM size of 13+ gigs and within a week or so is up to 16 Gig VM size and 13 or 14 Gigs resident. On a 16Gig machine, an inch beyond that and the OS starts to run out of room and I risk swap madness.
I suppose I ought to track down where the CPU cycles are going, whether it is ACL processing or just the overhead of getting all of the attributes from DB and building LDIF.
On Fri, 25 May 2007, Howard Chu wrote:
Eric Irrgang wrote:
I continue to have trouble with getting a freshly started server to be responsive. One problem in particular is one that I thought had been resolved some time ago but is apparently biting me right now...
With the hdb backend (at least in OL 2.3.34 and OL 2.3.35) if you perform a search with a search base deeper than the root suffix, the search takes a very long time to complete if the cache hasn't been established. In my case the difference is less than a second versus several hours. I'm not sure yet which bit of cache needs to be primed. I can switch back and forth searching with the same filter in the root and then a child search base with the same results.
If it takes several hours, then most likely your BDB cache is too small.
As for which cache, it's either the DN cache (aka EntryInfo in the code) or the IDL cache. (Currently the DN cache size is not configurable, will probably add a keyword for that in 2.4.)
Is this a bug recursion or something that I just hadn't been noticing?
What would be the best search to perform to prepare whatever cache is getting hit to make searches outside of the root DN faster?
Priming the caches will only help if you actually have sufficient RAM available. If the DB is too large, then there's not much you can do about it. If you have sufficient RAM, then doing a subtree search from the root, on an unindexed attribute, looking for a value that doesn't exist, will hit every entry in the DB and fully prime the DN cache (and the DN-related info in the IDL cache). It will cycle the full contents of the dn2id and id2entry DBs through the BDB cache as well.
Is there a way (with or without attaching a debugger) to find out what my IDL cache and DN cache is doing?
Eric Irrgang wrote:
Is there a way (with or without attaching a debugger) to find out what my IDL cache and DN cache is doing?
Using a debugger, set a breakpoint inside any of the backend functions. When the bdb pointer is set up, print out its data. The bdb->bi_cache structure records all the info about the entry cache. The bdb->bi_idl_* fields records the info about the IDL cache. In 2.4 some of these counters are exposed via back-monitor. We can add more to the monitor entry as needed.
Hei Howard!
A question: How do you setup the "lastmod" directive on slapd.conf? ON or OFF?
cheers.
On 5/25/07, Howard Chu hyc@symas.com wrote:
Eric Irrgang wrote:
Is there a way (with or without attaching a debugger) to find out what
my
IDL cache and DN cache is doing?
Using a debugger, set a breakpoint inside any of the backend functions. When the bdb pointer is set up, print out its data. The bdb->bi_cache structure records all the info about the entry cache. The bdb->bi_idl_* fields records the info about the IDL cache. In 2.4 some of these counters are exposed via back-monitor. We can add more to the monitor entry as needed.
-- -- Howard Chu Chief Architect, Symas Corp. http://www.symas.com Director, Highland Sun http://highlandsun.com/hyc/ Chief Architect, OpenLDAP http://www.openldap.org/project/
openldap-software@openldap.org