Howard and all,
I made more tests and looks like problem persists. I saw some changes but only in the
memory consumption in "consumer(slave)" syncrepl.
Let me try to explain better. I have a pair of provider/consumer machines where one
machine will always receive all read/writes and the other is just for High
Availability(HA) purposes, so it is better have the more close as possible the DBs.
I start the provider(master) and then just after start the consumer(slave). The
configuration doesn't appear to have problems since I have in my configuration 2 DBs,
CONTENT and INDEX, and I see consumer doing 2 searches in these DBs when started(this is
ok).
After this both consumer and provider CPU usage increases so as memory allocation by slapd
process. After the HEAD changes the memory consumption in consumer increases in a much
more fast rate, something like 10:1. In this way to reproduce the issue I needed to reduce
the dncachesize directive in consumer to 1/10 of the provider value, or from 4,000,000 to
400,000. This avoid the process to consume all memory before the issue arises.
Let me try to summarize :
1) Start provider(mater) slapd process;
2) Start consumer(slave) slapd process;
3) Monitor memory and CPU usage in both provider and consumer;
4) Make sometimes a monitor check to see the cache information;
5) Before cache is full in provider(master) I made a gdb debug to check the
consumer(slave) process threads;
6) Wait until the consumer(slave) process starts to use around 200% CPU and then collect
again a gdb debug;
7) Wait a little more until the provider(master) CPU usage becomes 0% and then see that
consumer(slave) CPU stay stable in 200%. Collect a gdb debug.
8) Wait some more time just for more gdb debug to see if something changed.
I re-compile the HEAD with GDB symbols for debugging. In this way I created the file
attached where more than once I collect the debug information from the consumer
slapd(includes the syncrepl thread). Please see file attached for details.
The item 7) is the issue I think is happening. The synchronization never ends, the
responsiveness from consumer(slave) to queries is very slow, CPU usage becomes fixed in
200%, and then the logic appears never be working as expected, or in the end never
synchronizing.
In the end appears that syncrepl still with some issue to synchronize the DBs.
Regards,
Rodrigo.
--- On Thu, 3/19/09, Howard Chu <hyc(a)symas.com> wrote:
From: Howard Chu <hyc(a)symas.com>
Subject: Re: slapd syncrepl consumer having permanent high CPU load
To: rlvcosta(a)yahoo.com
Cc: openldap-software(a)openldap.org, "John Morrissey" <jwm(a)horde.net>
Date: Thursday, March 19, 2009, 2:04 PM
Rodrigo Costa wrote:
>
> Folks,
>
> I was preparing openLDAP with GDB symbols but looks
like the issue was
identified and solved in HEAD. Just to identify this issue;
was created any
sort of ITS for verification in a new load?
No, the further work was just associated with ITS#5860.
> Sorry my late response but my baby daughter just born
last week and I was
having some work at home.
Congratulations!
> I will give a try in the HEAD load.
Try RE24 now, that's the current release candidate.
>
> Best Regards,
>
> Rodrigo.
>
> PS-> Just some link from my daughter
>
http://sites.google.com/site/lauramenina/laura_english
>
> --- On Wed, 3/18/09, Howard Chu<hyc(a)symas.com>
wrote:
>
>> From: Howard Chu<hyc(a)symas.com>
>> Subject: Re: slapd syncrepl consumer having
permanent high CPU load
>> To: "John Morrissey"<jwm(a)horde.net>
>> Cc: openldap-software(a)openldap.org
>> Date: Wednesday, March 18, 2009, 5:21 AM
>> John Morrissey wrote:
>>> After ~16h uptime, slapd with this BDB had
increased
>> its DN cache to ~250k
>>> entries after it previously appeared stable at
the
>> configured 20k entries,
>>> and its entry cache had ballooned to ~480k
entries.
>> Its RSS was about 3.6GB
>>> at this point, with a BDB cache size of 2GB.
>>
>> I was finally able to reproduce this (took several
hours of
>> searches. Fortunately I was at a St. Pat's party
so I didn't
>> have to wait around, just got home in time to see
it start
>> going bad...). A fix is now in HEAD.
>>
>> (And now we'll see if Guinness is Good For Your
Code... ;)
>> -- -- Howard Chu
>> CTO, Symas Corp.
>>
http://www.symas.com
>> Director, Highland Sun
http://highlandsun.com/hyc/
>> Chief Architect, OpenLDAP
http://www.openldap.org/project/
>>
>
>
>
>
>
--
-- Howard Chu
CTO, Symas Corp.
http://www.symas.com
Director, Highland Sun
http://highlandsun.com/hyc/
Chief Architect, OpenLDAP
http://www.openldap.org/project/