New subject: slapd syncrepl consumer having permanent high CPU load

24 Mar 2009


      Howard and all,
I made more tests and looks like problem persists. I saw some changes but only in the memory consumption in "consumer(slave)" syncrepl.
Let me try to explain better. I have a pair of provider/consumer machines where one machine will always receive all read/writes and the other is just for High Availability(HA) purposes, so it is better have the more close as possible the DBs.
I start the provider(master) and then just after start the consumer(slave). The configuration doesn't appear to have problems since I have in my configuration 2 DBs, CONTENT and INDEX, and I see consumer doing 2 searches in these DBs when started(this is ok).
After this both consumer and provider CPU usage increases so as memory allocation by slapd process. After the HEAD changes the memory consumption in consumer increases in a much more fast rate, something like 10:1. In this way to reproduce the issue I needed to reduce the dncachesize directive in consumer to 1/10 of the provider value, or from 4,000,000 to 400,000. This avoid the process to consume all memory before the issue arises.
Let me try to summarize :
1) Start provider(mater) slapd process;
2) Start consumer(slave) slapd process;
3) Monitor memory and CPU usage in both provider and consumer;
4) Make sometimes a monitor check to see the cache information;
5) Before cache is full in provider(master) I made a gdb debug to check the consumer(slave) process threads;
6) Wait until the consumer(slave) process starts to use around 200% CPU and then collect again a gdb debug;
7) Wait a little more until the provider(master) CPU usage becomes 0% and then see that consumer(slave) CPU stay stable in 200%. Collect a gdb debug.
8) Wait some more time just for more gdb debug to see if something changed.
I re-compile the HEAD with GDB symbols for debugging. In this way I created the file attached where more than once I collect the debug information from the consumer slapd(includes the syncrepl thread). Please see file attached for details.
The item 7) is the issue I think is happening. The synchronization never ends, the responsiveness from consumer(slave) to queries is very slow, CPU usage becomes fixed in 200%, and then the logic appears never be working as expected, or in the end never synchronizing.
In the end appears that syncrepl still with some issue to synchronize the DBs.
Regards,
Rodrigo.
--- On Thu, 3/19/09, Howard Chu hyc@symas.com wrote:
...
From: Howard Chu hyc@symas.com
Subject: Re: slapd syncrepl consumer having permanent high CPU load
To: rlvcosta@yahoo.com
Cc: openldap-software@openldap.org, "John Morrissey" jwm@horde.net
Date: Thursday, March 19, 2009, 2:04 PM
Rodrigo Costa wrote:
...
Folks,
I was preparing openLDAP with GDB symbols but looks
like the issue was
identified and solved in HEAD. Just to identify this issue;
was created any
sort of ITS for verification in a new load?
No, the further work was just associated with ITS#5860.
...
Sorry my late response but my baby daughter just born
last week and I was
having some work at home.
Congratulations!
...
I will give a try in the HEAD load.
Try RE24 now, that's the current release candidate.
...
Best Regards,
Rodrigo.
PS->  Just some link from my daughter
http://sites.google.com/site/lauramenina/laura_english
--- On Wed, 3/18/09, Howard Chuhyc@symas.com
wrote:
...
...
From: Howard Chuhyc@symas.com
Subject: Re: slapd syncrepl consumer having
permanent high CPU load
...
...
To: "John Morrissey"jwm@horde.net
Cc: openldap-software@openldap.org
Date: Wednesday, March 18, 2009, 5:21 AM
John Morrissey wrote:
...
After ~16h uptime, slapd with this BDB had
increased
...
...
its DN cache to ~250k
...
entries after it previously appeared stable at
the
...
...
configured 20k entries,
...
and its entry cache had ballooned to ~480k
entries.
...
...
Its RSS was about 3.6GB
...
at this point, with a BDB cache size of 2GB.
I was finally able to reproduce this (took several
hours of
...
...
searches. Fortunately I was at a St. Pat's party
so I didn't
...
...
have to wait around, just got home in time to see
it start
...
...
going bad...). A fix is now in HEAD.
(And now we'll see if Guinness is Good For Your
Code... ;)
...
...
--   -- Howard Chu
    CTO, Symas Corp.
     http://www.symas.com
    Director, Highland Sun
http://highlandsun.com/hyc/
...
...
Chief Architect, OpenLDAP  http://www.openldap.org/project/
-- 
   -- Howard Chu
   CTO, Symas Corp.     
     http://www.symas.com
   Director, Highland Sun 
   http://highlandsun.com/hyc/
   Chief Architect, OpenLDAP  http://www.openldap.org/project/