--On Tuesday, August 25, 2009 1:38 PM -0400 Francis Swasey Frank.Swasey@uvm.edu wrote:
On 8/25/09 12:36 PM, Quanah Gibson-Mount wrote:
--On Tuesday, August 25, 2009 12:11 PM -0400 Francis Swasey Frank.Swasey@uvm.edu wrote:
On 8/25/09 11:45 AM, Aaron Richton wrote:
On Tue, 25 Aug 2009, Lepoutre Lionel wrote:
My problem is that some data are not synchronised on one of my server and I have some "log.xxxx" files in my var/openldap-data/ directory.
When I had an issue with my replicas getting out of sync I developed a process to slapcat each of the replica's generate what was different from the master and cause the master to make the changes again (ie, reverse the master and then revert to what the master knew was correct) which caused the information to get pushed to the replica's again. In my case, the problem turned out to be one of my replica's had too little memory and was triggering a bug in v2.3 which caused the changes for delta-syncrepl to not get logged in the accessdb on the provider.
Was this ever fixed in 2.3? Do you have an ITS#? And interesting a replica out of memory would cause the provider not to log data into the accesslog. I'm curious because I'm seeing an issue right now where a ton of deletes are executed, and all the replicas of the master are going into refresh mode on the same entry periodically during the deletes which makes me think that possibly the accesslog is missing writing out some of the changes.
I never filed an ITS for it. I discussed it on this list and you and Howard gave me pointers. I theorize the root cause was the design problem that allowed a consumer to cause the provider to hold a thread on the accesslog (ITS# 5985: replication lockout with syncrepl) and its interaction with the replica that needed more memory caused changes (during high volume change periods) to get backed up so far on the provider that they fell off the end of the queue and were never written to the accesslog.
To find the thread about my issue -- search for the subject "delta-syncrepl missing changes" starting on January 30 and ending around March 20 of this year in the openldap-software list.
Since upgrading the memory on that replica (from 1GB to 5GB), I have not had the problem again.
Yeah, I remembered bits and pieces of the thread, but it's been a while. I'm not sure this is the same issue, because the replicas all have a ton of memory, but it could be similar, just because there are 6 replicas causing lockout (same people that got me to file ITS#5985 in the first place). Still, I think there should never be a case where the provider fails to write updates to the accesslog db, regardless of the load replicas are putting on it. Hopefully the ITS#5985 fix takes care of that.
Thanks!
--Quanah
--
Quanah Gibson-Mount Principal Software Engineer Zimbra, Inc -------------------- Zimbra :: the leader in open source messaging and collaboration