[Openldap 2.4.16] Is it possible to force synchronization: files log.xxxx not treated after a crash - openldap-software - openldap.org

List overview All Threads
Download

[Openldap 2.4.16] Is it possible to force synchronization: files log.xxxx not treated after a crash

cache configuration constraints...

Possible linux tuning for...

Lepoutre Lionel

25 Aug 2009 25 Aug '09

2:30 a.m.

Hello,

I am using openldap 2.4.16 in a multi-master configuration mode (2 servers on linux system - LFS). My problem is that some data are not synchronised on one of my server and I have some "log.xxxx" files in my var/openldap-data/ directory. I think it is normal as these files are the synchronisation files and are not suppress until the synchronisation is validated (in my DB_CONFIG I have set the parameter: set_flags DB_LOG_AUTOREMOVE). The replication is at that moment working (I have made some tests) but I know that the ldap process has crashed as I found some logs:

*Aug 20 10:08:40 bpldap02s kernel: slapd[19783]: segfault at 0 ip 080926f3 sp ad5a03a0 error 4 in slapd[8048000+1b9000]* But I was on holidays while it happens so I don't have more information on how it happens.

Is there something I can do to force the files to be treated? I have read some articles about "slurpd" but I am not sure it can be used with my version.

For the moment I only see the solution: "slapcat -> ldapadd" to synchronize both instances but if you have any other solution...

Thank you for your help.

Lionel

Attachments:

attachment.htm (text/html — 3.1 KB)

Reply

Show replies by date

Aaron Richton

25 Aug 25 Aug

8:45 a.m.

On Tue, 25 Aug 2009, Lepoutre Lionel wrote:

My problem is that some data are not synchronised on one of my server and I have some "log.xxxx" files in my var/openldap-data/ directory.

The "log.xxxx" are BerkeleyDB transaction log files. They should be automagically replayed as needed in the 2.4 series.

set_flags DB_LOG_AUTOREMOVE).

With this set, there really shouldn't be a need for periodic treatment of the log.* files.

*Aug 20 10:08:40 bpldap02s kernel: slapd[19783]: segfault at 0 ip 080926f3 sp ad5a03a0 error 4 in slapd[8048000+1b9000]*

If slapd crashed and did not resync appropriately on startup, that's unfortunately probably a bug in slapd. There's been a lot of work on syncprov/syncrepl that will be present in 2.4.18 (see the ITS) so hopefully an upgrade will address your issue. If you care to testbed with the RE24 CVS and share your feedback, that would help greatly to ensure this.

For the moment I only see the solution: "slapcat -> ldapadd" to synchronize both instances but if you have any other solution...

I think that's where you're going to end up. "slapcat" (on server with proper data) -> "slapadd -q" (on server with missing entries) would be a faster option.

Reply

Lepoutre Lionel

8:56 a.m.

Thank you for your answer and your advice Aaron. concerning the upgrade of my version I won't be able to use a non stable version :( But I will try to use it on a test environment.

On Tue, Aug 25, 2009 at 5:45 PM, Aaron Richton richton@nbcs.rutgers.eduwrote:

On Tue, 25 Aug 2009, Lepoutre Lionel wrote:

My problem is that some data are not synchronised on one of my server and

...
I have some "log.xxxx" files in my var/openldap-data/ directory.

The "log.xxxx" are BerkeleyDB transaction log files. They should be automagically replayed as needed in the 2.4 series.

set_flags DB_LOG_AUTOREMOVE).

...
With this set, there really shouldn't be a need for periodic treatment of the log.* files.

*Aug 20 10:08:40 bpldap02s kernel: slapd[19783]: segfault at 0 ip 080926f3

...
sp ad5a03a0 error 4 in slapd[8048000+1b9000]*

If slapd crashed and did not resync appropriately on startup, that's unfortunately probably a bug in slapd. There's been a lot of work on syncprov/syncrepl that will be present in 2.4.18 (see the ITS) so hopefully an upgrade will address your issue. If you care to testbed with the RE24 CVS and share your feedback, that would help greatly to ensure this.

For the moment I only see the solution: "slapcat -> ldapadd" to

...
synchronize both instances but if you have any other solution...

I think that's where you're going to end up. "slapcat" (on server with proper data) -> "slapadd -q" (on server with missing entries) would be a faster option.

Reply

Francis Swasey

9:11 a.m.

On 8/25/09 11:45 AM, Aaron Richton wrote:

On Tue, 25 Aug 2009, Lepoutre Lionel wrote:

...
My problem is that some data are not synchronised on one of my server and I have some "log.xxxx" files in my var/openldap-data/ directory.

When I had an issue with my replicas getting out of sync I developed a process to slapcat each of the replica's generate what was different from the master and cause the master to make the changes again (ie, reverse the master and then revert to what the master knew was correct) which caused the information to get pushed to the replica's again. In my case, the problem turned out to be one of my replica's had too little memory and was triggering a bug in v2.3 which caused the changes for delta-syncrepl to not get logged in the accessdb on the provider.

The gist of the process I developed was to ssh to each replica, slapcat the existing database, sftp that back to the master, slapcat the master's database, use an ldif diff tool to generate the changes as if the replica was the master, apply those changes to the master, then do the ldif diff in the other direction and apply those changes to the master. It was an attrocious hack, but it allowed me to re-sync the replica's without having to wait for the load balancers to take them out of the service pool and rebuild them by hand on a regular basis.

However, having log.xxxx files is not a sign that anything is wrong. Even with auto-removal of the log.xxxx files, you will always have at least one present and if you have a massive change happen, you may get a few before the checkpoint happens and makes them superfluous.

How have you determined that your servers are not in sync?

-- Frank Swasey | http://www.uvm.edu/~fcs Sr Systems Administrator | Always remember: You are UNIQUE, University of Vermont | just like everyone else. "I am not young enough to know everything." - Oscar Wilde (1854-1900)

Reply

Quanah Gibson-Mount

9:36 a.m.

New subject: [Openldap 2.4.16] Is it possible to force synchronization: files log.xxxx not treated after a crash

--On Tuesday, August 25, 2009 12:11 PM -0400 Francis Swasey Frank.Swasey@uvm.edu wrote:

On 8/25/09 11:45 AM, Aaron Richton wrote:

...
On Tue, 25 Aug 2009, Lepoutre Lionel wrote:

...
My problem is that some data are not synchronised on one of my server and I have some "log.xxxx" files in my var/openldap-data/ directory.

When I had an issue with my replicas getting out of sync I developed a process to slapcat each of the replica's generate what was different from the master and cause the master to make the changes again (ie, reverse the master and then revert to what the master knew was correct) which caused the information to get pushed to the replica's again. In my case, the problem turned out to be one of my replica's had too little memory and was triggering a bug in v2.3 which caused the changes for delta-syncrepl to not get logged in the accessdb on the provider.

Was this ever fixed in 2.3? Do you have an ITS#? And interesting a replica out of memory would cause the provider not to log data into the accesslog. I'm curious because I'm seeing an issue right now where a ton of deletes are executed, and all the replicas of the master are going into refresh mode on the same entry periodically during the deletes which makes me think that possibly the accesslog is missing writing out some of the changes.

--Quanah

--

Quanah Gibson-Mount Principal Software Engineer Zimbra, Inc -------------------- Zimbra :: the leader in open source messaging and collaboration

Reply

Francis Swasey

10:38 a.m.

On 8/25/09 12:36 PM, Quanah Gibson-Mount wrote:

--On Tuesday, August 25, 2009 12:11 PM -0400 Francis Swasey Frank.Swasey@uvm.edu wrote:

...
On 8/25/09 11:45 AM, Aaron Richton wrote:

...
On Tue, 25 Aug 2009, Lepoutre Lionel wrote:

...
My problem is that some data are not synchronised on one of my server and I have some "log.xxxx" files in my var/openldap-data/ directory.

When I had an issue with my replicas getting out of sync I developed a process to slapcat each of the replica's generate what was different from the master and cause the master to make the changes again (ie, reverse the master and then revert to what the master knew was correct) which caused the information to get pushed to the replica's again. In my case, the problem turned out to be one of my replica's had too little memory and was triggering a bug in v2.3 which caused the changes for delta-syncrepl to not get logged in the accessdb on the provider.

Was this ever fixed in 2.3? Do you have an ITS#? And interesting a replica out of memory would cause the provider not to log data into the accesslog. I'm curious because I'm seeing an issue right now where a ton of deletes are executed, and all the replicas of the master are going into refresh mode on the same entry periodically during the deletes which makes me think that possibly the accesslog is missing writing out some of the changes.

I never filed an ITS for it. I discussed it on this list and you and Howard gave me pointers. I theorize the root cause was the design problem that allowed a consumer to cause the provider to hold a thread on the accesslog (ITS# 5985: replication lockout with syncrepl) and its interaction with the replica that needed more memory caused changes (during high volume change periods) to get backed up so far on the provider that they fell off the end of the queue and were never written to the accesslog.

To find the thread about my issue -- search for the subject "delta-syncrepl missing changes" starting on January 30 and ending around March 20 of this year in the openldap-software list.

Since upgrading the memory on that replica (from 1GB to 5GB), I have not had the problem again.

-- Frank Swasey | http://www.uvm.edu/~fcs Sr Systems Administrator | Always remember: You are UNIQUE, University of Vermont | just like everyone else. "I am not young enough to know everything." - Oscar Wilde (1854-1900)

Reply

Quanah Gibson-Mount

10:46 a.m.

New subject: [Openldap 2.4.16] Is it possible to force synchronization: files log.xxxx not treated after a crash

--On Tuesday, August 25, 2009 1:38 PM -0400 Francis Swasey Frank.Swasey@uvm.edu wrote:

On 8/25/09 12:36 PM, Quanah Gibson-Mount wrote:

...
--On Tuesday, August 25, 2009 12:11 PM -0400 Francis Swasey Frank.Swasey@uvm.edu wrote:

...
On 8/25/09 11:45 AM, Aaron Richton wrote:

...
On Tue, 25 Aug 2009, Lepoutre Lionel wrote:

...
My problem is that some data are not synchronised on one of my server and I have some "log.xxxx" files in my var/openldap-data/ directory.

When I had an issue with my replicas getting out of sync I developed a process to slapcat each of the replica's generate what was different from the master and cause the master to make the changes again (ie, reverse the master and then revert to what the master knew was correct) which caused the information to get pushed to the replica's again. In my case, the problem turned out to be one of my replica's had too little memory and was triggering a bug in v2.3 which caused the changes for delta-syncrepl to not get logged in the accessdb on the provider.

Was this ever fixed in 2.3? Do you have an ITS#? And interesting a replica out of memory would cause the provider not to log data into the accesslog. I'm curious because I'm seeing an issue right now where a ton of deletes are executed, and all the replicas of the master are going into refresh mode on the same entry periodically during the deletes which makes me think that possibly the accesslog is missing writing out some of the changes.

I never filed an ITS for it. I discussed it on this list and you and Howard gave me pointers. I theorize the root cause was the design problem that allowed a consumer to cause the provider to hold a thread on the accesslog (ITS# 5985: replication lockout with syncrepl) and its interaction with the replica that needed more memory caused changes (during high volume change periods) to get backed up so far on the provider that they fell off the end of the queue and were never written to the accesslog.

To find the thread about my issue -- search for the subject "delta-syncrepl missing changes" starting on January 30 and ending around March 20 of this year in the openldap-software list.

Since upgrading the memory on that replica (from 1GB to 5GB), I have not had the problem again.

Yeah, I remembered bits and pieces of the thread, but it's been a while. I'm not sure this is the same issue, because the replicas all have a ton of memory, but it could be similar, just because there are 6 replicas causing lockout (same people that got me to file ITS#5985 in the first place). Still, I think there should never be a case where the provider fails to write updates to the accesslog db, regardless of the load replicas are putting on it. Hopefully the ITS#5985 fix takes care of that.

Thanks!

--Quanah

--

Quanah Gibson-Mount Principal Software Engineer Zimbra, Inc -------------------- Zimbra :: the leader in open source messaging and collaboration

Reply

Lepoutre Lionel

11:59 p.m.

Hi,

I know that having a log.xxxx is normal but these ones were 4 days old so I was "suspicious". I made a diff of the dump of my servers to have the confirmation.

On Tue, Aug 25, 2009 at 6:11 PM, Francis Swasey Frank.Swasey@uvm.eduwrote:

On 8/25/09 11:45 AM, Aaron Richton wrote:

...
On Tue, 25 Aug 2009, Lepoutre Lionel wrote:

My problem is that some data are not synchronised on one of my server and

...
I have some "log.xxxx" files in my var/openldap-data/ directory.

When I had an issue with my replicas getting out of sync I developed a process to slapcat each of the replica's generate what was different from the master and cause the master to make the changes again (ie, reverse the master and then revert to what the master knew was correct) which caused the information to get pushed to the replica's again. In my case, the problem turned out to be one of my replica's had too little memory and was triggering a bug in v2.3 which caused the changes for delta-syncrepl to not get logged in the accessdb on the provider.

The gist of the process I developed was to ssh to each replica, slapcat the existing database, sftp that back to the master, slapcat the master's database, use an ldif diff tool to generate the changes as if the replica was the master, apply those changes to the master, then do the ldif diff in the other direction and apply those changes to the master. It was an attrocious hack, but it allowed me to re-sync the replica's without having to wait for the load balancers to take them out of the service pool and rebuild them by hand on a regular basis.

However, having log.xxxx files is not a sign that anything is wrong. Even with auto-removal of the log.xxxx files, you will always have at least one present and if you have a massive change happen, you may get a few before the checkpoint happens and makes them superfluous.

How have you determined that your servers are not in sync?

-- Frank Swasey | http://www.uvm.edu/~fcs http://www.uvm.edu/%7Efcs Sr Systems Administrator | Always remember: You are UNIQUE, University of Vermont | just like everyone else. "I am not young enough to know everything." - Oscar Wilde (1854-1900)

Reply

5799

Age (days ago)

5800

Last active (days ago)

openldap-software@openldap.org

7 comments

4 participants

tags (0)

participants (4)

Aaron Richton
Francis Swasey
Lepoutre Lionel
Quanah Gibson-Mount