On Tuesday, 18 August 2009 21:30:31 Rodrigo Costa wrote:
openldap software community,
I'm facing some difficulties to have database synchronized with syncrepl. I'm running the latest openldap 2.4.17 version which after these issues I compiled with gdb.
I have a DB(divided really in 2 DBs) where each one has around 4 million entrances. Based in memory limitations I have a dncachesize configured with around 3000000, or smaller than the maximum number of entrances in DBs.
I loaded both server with all indexes and the same data. Starting both there isn't any need for syncrepl(thread from slapd) to make any search and then both mirrors are in sync and consuming each other. If a new entrance is create the other consumes since both are listening right on when it happens.
If I stop one mirror and create even small number of entrances in the other, like 10, when I try to start the other provider the syncrepl enters in conventional syncrepl replication which search the DB for synchronization.
This never ends causing mirrors not in synchronization. What I can see is :
- Stop the Second mirror, like for slapcat(calling second and first as
reference); 2) Add a few entrances in First mirror(kept on-line); 3) Second mirror start again after First mirror had some new entrances added by normal operation; 4) Syncrepl in second mirror enters in the conventional syncrepl replication since it detects that something is different between mirrors; 5) Until dncache is not filled the First mirror slapd cpu consumption is below 100%(around 50%) and search happens in a good manner since monitor shows it; 6) After dncache is filled(oscillates above 3mi) the First mirror cpu consumption enter in 100% consumption, oscillating between 98% to 102%; 7) The search never ends and then systems are never in sync. Cpu is permanently in high consumption, almost always in 100%.
I let days this process running and I could see only a one or two entrances in sync. By the CPU looks like something is hanging the search where some loop is keeping the thread consuming one full cpu processing.
I could collect some GDB information which I'm sending attached. Not sure how to interpret this overlay_walk.
The idea is to stop one mirror for backup releasing this task from the primary server. For this replication would need to happen.
Your comments are very welcome.
You have provided absolutely no configuration information. There may well be other explanations for this behaviour than the dncachesize. I can think of at least two.
You also haven't provided information on the systems you are using. E.g., you may be trying on systems with too little memory (e.g., <1GB), which might be totally inadequate for the amount of data you have.
Regards, Buchan