syncrepl error (53) with 3-way delta-mmr (consumer state is newer than provider)

List overview All Threads
Download

newer

older

OpenLDAP not starting using...

Re: Antw: Re: Removing Berkeley DB...

Sven Mäder

30 Aug 2017 30 Aug '17

5:51 a.m.

We have a 3-way delta-mmr syncrepl setup (Debian Stretch with slapd 2.4.44+dfsg-5+deb9u1). 2 of those 3 hosts were powered off for about 4 hours. After the bootup and slapd start, the host which was running all the time during the downtime started to log:

SEARCH RESULT tag=101 err=53 nentries=0 text=consumer state is newer than provider!

Purging the accesslog database fixed the issue.

Could this have happened due to a timesync problem? We noticed, that right after boot, the ntpd service was oscillating in its time offset from 0.0192 to 0.0003 for ~3 minutes.

Does somebody have experience with this?

Do we need to delay slapd or force an `ntpdate` before slapd starts in the boot process? Because slapd has the following LSB headers in the init script

# Required-Start: $remote_fs $network $syslog

it is started (using systemd service file autogenerated from init.d script) right after network.target has been reached and simultaneously with ntpd. Whereas slapd only takes about 1 second to start, ntpd takes about 10 seconds and it might even take much longer to get the time in sync.

Kind regards

-- Sven Mäder IT Services Group Physics Department, ETH Zurich

Attachments:

attachment.htm (text/html — 1.7 KB)

Show replies by date

Ulrich Windl

31 Aug 31 Aug

12:37 a.m.

New subject: Antw: syncrepl error (53) with 3-way delta-mmr (consumer state is newer than provider)

...

...
...
Sven Mäder maeder@phys.ethz.ch schrieb am 30.08.2017 um 14:51 in

Nachricht 2c527361-fd25-9002-1aa5-96ba00a69135@phys.ethz.ch:

...

Hi

We have a 3-way delta-mmr syncrepl setup (Debian Stretch with slapd 2.4.44+dfsg-5+deb9u1). 2 of those 3 hosts were powered off for about 4 hours. After the bootup and slapd start, the host which was running all the time during the downtime started to log:
SEARCH RESULT tag=101 err=53 nentries=0 text=consumer state is newer
than provider!

Purging the accesslog database fixed the issue.

Could this have happened due to a timesync problem? We noticed, that right after boot, the ntpd service was oscillating in its time offset from 0.0192 to 0.0003 for ~3 minutes.

Does somebody have experience with this?

Do we need to delay slapd or force an `ntpdate` before slapd starts in the boot process? Because slapd has the following LSB headers in the init script
# Required-Start:    $remote_fs $network $syslog
it is started (using systemd service file autogenerated from init.d script) right after network.target has been reached and simultaneously with ntpd. Whereas slapd only takes about 1 second to start, ntpd takes about 10 seconds and it might even take much longer to get the time in sync.

Hi!

Some of the time ntpd needs to sync may be host name resolution (if you use names). Methods to speed up initial synchronization inlude "iburst", "minpoll" and adding a large crowd of servers. Note that reducing minpoll could reduce the final accuracy (just as increasing "maxpoll" does). Depending on your network and load I would not rely on a time offset less than a few ten milliseconds. How well LDAP can operate then is a different question.

Also note for Linux (on most platforms) and NTP one problem is that the frequency correction needed for the clock can vary significantly between boots; thus the tijme for "perfect sync" can be quite long. See attached image for an example.

Updating one entry on different servers within a very short time (shorter than the time of syncing) will probably cause trouble. What real-life situation causes such?

Regards, Ulrich

...

Kind regards

-- Sven Mäder IT Services Group Physics Department, ETH Zurich

Sven Mäder

1 Sep 1 Sep

7:53 a.m.

New subject: Antw: syncrepl error (53) with 3-way delta-mmr (consumer state is newer than provider)

Hi Ulrich

Thank you for your response.

On 08/31/2017 09:37 AM, Ulrich Windl wrote:

...

Hi!

Some of the time ntpd needs to sync may be host name resolution (if you use names). Methods to speed up initial synchronization inlude "iburst", "minpoll" and adding a large crowd of servers. Note that reducing minpoll could reduce the final accuracy (just as increasing "maxpoll" does). Depending on your network and load I would not rely on a time offset less than a few ten milliseconds. How well LDAP can operate then is a different question.

We have 2 timeservers (stratum 1) in our local net with gps clock source: server time1.phys.ethz.ch minpoll 4 maxpoll 10 iburst server time2.phys.ethz.ch minpoll 4 maxpoll 10 iburst

minpoll is already set at its lowest value, although I do not understand what this option does. I may increase its value, increasing accuracy sounds good.

# ntptrace localhost: stratum 2, offset 0.000008, synch distance 0.031825 time1.ethz.ch: stratum 1, offset -0.000001, synch distance 0.000298, refid 'PPS'

# ntpq -c pe remote refid st t when poll reach delay offset jitter ============================================================================== LOCAL(0) .LOCL. 5 l 149m 64 0 0.000 0.000 0.000 *time1.ethz.ch .PPS. 1 u 251 256 377 0.426 0.009 0.026 +time2.ethz.ch .PPS. 1 u 64 256 377 0.430 -0.003 0.019

Looks like the time offset is only a few microseconds once ntp is in sync.

...

Also note for Linux (on most platforms) and NTP one problem is that the frequency correction needed for the clock can vary significantly between boots; thus the tijme for "perfect sync" can be quite long. See attached image for an example.

This is very interesting, we will look further into this. I am thinking about waiting in the startup process until ntp is in "perfect sync"and start slapd after that. Maybe I can use the loopstats file to check/automate this.

...

Updating one entry on different servers within a very short time (shorter than the time of syncing) will probably cause trouble. What real-life situation causes such?

Probably none. But we have a logparser, which writes "last use" statistics of our users to ldap, this is done in "realtime". We also use openldap as kerberos KDC database backend, which writes on successful/failed authentication attempts. The chance is probably very low if the time offset is lower than the network delay.

...

Regards, Ulrich

Kind regards

-- Sven

2871

Age (days ago)

2873

Last active (days ago)

openldap-technical@openldap.org

2 comments

2 participants

tags (0)

participants (2)

Sven Mäder
Ulrich Windl