On Tuesday, 18 August 2009 21:30:31 Rodrigo Costa wrote:
openldap software community,
I'm facing some difficulties to have database synchronized with
syncrepl. I'm running the latest openldap 2.4.17 version which after
these issues I compiled with gdb.
I have a DB(divided really in 2 DBs) where each one has around 4 million
entrances. Based in memory limitations I have a dncachesize configured
with around 3000000, or smaller than the maximum number of entrances in
DBs.
I loaded both server with all indexes and the same data. Starting both
there isn't any need for syncrepl(thread from slapd) to make any search
and then both mirrors are in sync and consuming each other. If a new
entrance is create the other consumes since both are listening right on
when it happens.
If I stop one mirror and create even small number of entrances in the
other, like 10, when I try to start the other provider the syncrepl
enters in conventional syncrepl replication which search the DB for
synchronization.
This never ends causing mirrors not in synchronization. What I can see is :
1) Stop the Second mirror, like for slapcat(calling second and first as
reference);
2) Add a few entrances in First mirror(kept on-line);
3) Second mirror start again after First mirror had some new entrances
added by normal operation;
4) Syncrepl in second mirror enters in the conventional syncrepl
replication since it detects that something is different between mirrors;
5) Until dncache is not filled the First mirror slapd cpu consumption is
below 100%(around 50%) and search happens in a good manner since monitor
shows it;
6) After dncache is filled(oscillates above 3mi) the First mirror cpu
consumption enter in 100% consumption, oscillating between 98% to 102%;
7) The search never ends and then systems are never in sync. Cpu is
permanently in high consumption, almost always in 100%.
I let days this process running and I could see only a one or two
entrances in sync. By the CPU looks like something is hanging the search
where some loop is keeping the thread consuming one full cpu processing.
I could collect some GDB information which I'm sending attached. Not
sure how to interpret this overlay_walk.
The idea is to stop one mirror for backup releasing this task from the
primary server. For this replication would need to happen.
Your comments are very welcome.
You have provided absolutely no configuration information. There may well be
other explanations for this behaviour than the dncachesize. I can think of at
least two.
You also haven't provided information on the systems you are using. E.g., you
may be trying on systems with too little memory (e.g., <1GB), which might be
totally inadequate for the amount of data you have.
Regards,
Buchan