syncrepl - when master restarted, slaves loose connection forever (and never reconnects)? - openldap-software

List overview All Threads
Download

newer

syncrepl - when master restarted, slaves loose connection forever (and never reconnects)?

older

Slapadd Problem

listening on localhost only

Tomasz Chmielewski

15 Dec 2006 15 Dec '06

3:15 a.m.

I decided to use syncrepl (refreshAndPersist) instead of slurpd to synchronize.

I made simple tests but something doesn't work right for me.

I started a totally empty OpenLDAP slave - it got replicated from the master. Any change made on the master is also replicated fine.

Unfortunately, when the connection between the slave and the master is broken (i.e., master restarted), the slave never reconnects. According to the fine slapd.conf manual, reconnection should be made quite fast:

"If the connection is lost, the consumer will attempt to reconnect at an interval time (specified by interval parameter; 60 seconds by default) until the session is re-established."

I waited for about 12 hours, and it didn't happen. Restarting slave helped, and all pending changes were transferred from the master.

Am I missing a setting or something?

The slave is running slapd from OpenLDAP 2.3.37, and here is its slapd.conf part concerning syncrepl:

syncrepl rid=0 provider=ldap://master:389 # interval=00:00:00:30 <- also didn't help if it's 30 secs type=refreshAndPersist searchbase="dc=some,dc=domain" bindmethod=simple binddn="cn=replicationuser,dc=some,dc=domain" credentials=secret schemachecking=off

-- Tomasz Chmielewski http://wpkg.org

Show replies by date

Tomasz Chmielewski

15 Dec 15 Dec

3:51 a.m.

Tomasz Chmielewski wrote:

...

I waited for about 12 hours, and it didn't happen. Restarting slave helped, and all pending changes were transferred from the master.

Am I missing a setting or something?

The slave is running slapd from OpenLDAP 2.3.37, and here is its slapd.conf part concerning syncrepl:

syncrepl rid=0 provider=ldap://master:389 type=refreshAndPersist searchbase="dc=some,dc=domain" bindmethod=simple binddn="cn=replicationuser,dc=some,dc=domain" credentials=secret schemachecking=off

All right, I read an old manual...

Adding

retry="60 +"

solved the issue...

-- Tomasz Chmielewski http://wpkg.org

Gavin Henry

7:33 a.m.

New subject: syncrepl - when master restarted, slaves loose connection forever (and never reconnects)?

...

I decided to use syncrepl (refreshAndPersist) instead of slurpd to synchronize.

I made simple tests but something doesn't work right for me.

I started a totally empty OpenLDAP slave - it got replicated from the master. Any change made on the master is also replicated fine.

Unfortunately, when the connection between the slave and the master is broken (i.e., master restarted), the slave never reconnects. According to the fine slapd.conf manual, reconnection should be made quite fast:

"If the connection is lost, the consumer will attempt to reconnect at an interval time (specified by interval parameter; 60 seconds by default) until the session is re-established."

I waited for about 12 hours, and it didn't happen. Restarting slave helped, and all pending changes were transferred from the master.

Am I missing a setting or something?

Reading slapd.conf(5) correctly ;-)

"In the refreshAndPersist operation, a synchronization search remains persistent in the provider slapd.

Further updates to the master replica will generate searchResultEntry to the consumer slapd as the search responses to the persistent synchronization search.

If an error occurs during replication, the consumer will attempt to reconnect according to the retry parameter which is a list of the <retry interval> and <# of retries> pairs. For example, retry="60 10 300 3" lets the consumer retry every 60 seconds for the first 10 times and then retry every 300 seconds for the next 3 times before stop retrying.

The + in <# of retries> means indefinite number of retries until success."

Gavin.

-- Kind Regards, Gavin Henry. Managing Director. T +44 (0) 1224 279484 M +44 (0) 7930 323266 F +44 (0) 1224 824887 E ghenry@suretecsystems.com Open Source. Open Solutions(tm). http://www.suretecsystems.com/

Dmitriy Kirhlarov

7:37 a.m.

New subject: syncrepl - when master restarted, slaves loose connection forever (and never reconnects)?

On Fri, Dec 15, 2006 at 12:15:19PM +0100, Tomasz Chmielewski wrote:

...

"If the connection is lost, the consumer will attempt to reconnect at an interval time (specified by interval parameter; 60 seconds by default) until the session is re-established."

...

syncrepl rid=0 provider=ldap://master:389 # interval=00:00:00:30 <- also didn't help if it's 30 secs

retry="500 +"

...

          type=refreshAndPersist
          searchbase="dc=some,dc=domain"
          bindmethod=simple
          binddn="cn=replicationuser,dc=some,dc=domain"
          credentials=secret
          schemachecking=off

WBR

-- Dmitriy Kirhlarov OILspace, 26 Leninskaya sloboda, bld. 2, 2nd floor, 115280 Moscow, Russia P:+7 495 105 7247 ext.208 F:+7 495 105 7246 E:DmitriyKirhlarov@oilspace.com Building Successful Supply Chains - One Solution At A Time. www.oilspace.com

Howard Chu

7:45 a.m.

This is ITS#4708, fixed in 2.3.28.

Tomasz Chmielewski wrote:

...

I decided to use syncrepl (refreshAndPersist) instead of slurpd to synchronize.

I made simple tests but something doesn't work right for me.

I started a totally empty OpenLDAP slave - it got replicated from the master. Any change made on the master is also replicated fine.

Unfortunately, when the connection between the slave and the master is broken (i.e., master restarted), the slave never reconnects. According to the fine slapd.conf manual, reconnection should be made quite fast:

"If the connection is lost, the consumer will attempt to reconnect at an interval time (specified by interval parameter; 60 seconds by default) until the session is re-established."

I waited for about 12 hours, and it didn't happen. Restarting slave helped, and all pending changes were transferred from the master.

Am I missing a setting or something?

The slave is running slapd from OpenLDAP 2.3.37, and here is its slapd.conf part concerning syncrepl:

-- -- Howard Chu Chief Architect, Symas Corp. http://www.symas.com Director, Highland Sun http://highlandsun.com/hyc OpenLDAP Core Team http://www.openldap.org/project/

Tomasz Chmielewski

8:04 a.m.

Howard Chu wrote:

...

This is ITS#4708, fixed in 2.3.28.

All right. I upgraded to 2.3.30 in the meantime, and added retry="60 +" to slapd.conf.

-- Tomasz Chmielewski http://wpkg.org

Aaron Richton

7:50 a.m.

...

[syncrepl slave isn't reconnecting] Am I missing a setting or something?

First, I'm concerned that you might be confusing "interval" (used with refreshOnly) with "retry" (used with refreshAndPersist). This has been discussed previously on list, and is described in slapd.conf(5) man page.

With that said, your symptoms sound like ITS #4708, resolved in OpenLDAP 2.3.28. In short, set "retry", upgrade to 2.3.30, and try again.

Tomasz Chmielewski

8:32 a.m.

Aaron Richton wrote:

...

...
[syncrepl slave isn't reconnecting] Am I missing a setting or something?

First, I'm concerned that you might be confusing "interval" (used with refreshOnly) with "retry" (used with refreshAndPersist). This has been discussed previously on list, and is described in slapd.conf(5) man page.

I looked at slapd.conf, but on machine running OpenLDAP 2.2.x... That's the confusion...

...

With that said, your symptoms sound like ITS #4708, resolved in OpenLDAP 2.3.28. In short, set "retry", upgrade to 2.3.30, and try again.

That's what I did, everything works now properly.

PS. does this list lag only for me (a couple of hours), or is it more fundamental? I noticed such lagging a couple of months ago as well.

-- Tomasz Chmielewski http://wpkg.org

6784

Age (days ago)

6784

Last active (days ago)

openldap-software@openldap.org

7 comments

5 participants

tags (0)

participants (5)

Aaron Richton
Dmitriy Kirhlarov
Gavin Henry
Howard Chu
Tomasz Chmielewski