syncrepl: consumer state is newer than provider

List overview All Threads
Download

newer

older

Ldappasswd failure

Memory usage (ITS 6660): Is there...

Mahadevan, Venkatasubramanian

29 Jul 2011 29 Jul '11

2:03 p.m.

Hello,

I have 2 OpenLDAP servers with the following configuration:

-- OpenLDAP 2.4.26-Release running on Red Hat Enterprise 5.5 -- The two servers are setup in a mirrored multi-master configuration. Below is the relevant portion of the slapd.conf:

server1 ---------- syncrepl rid=002 provider=ldaps://server2 type=refreshAndPersist retry="5 5 300 +" searchbase="o=ourdomain.ca" attrs="*,+" bindmethod=simple binddn="cn=Replication Manager,o=ubc.ca" credentials=something

mirrormode TRUE overlay syncprov syncprov-checkpoint 100 10

server2 ---------- syncrepl rid=001 provider=ldaps://server1 type=refreshAndPersist retry="5 5 300 +" searchbase="o=ourdomain.ca" attrs="*,+" bindmethod=simple binddn="cn=Replication Manager,o=ubc.ca" credentials=something

mirrormode TRUE overlay syncprov syncprov-checkpoint 100 10

The servers have their clocks synchronized using ntp. Below is the output of ntpq:

server1 ---------- ntpq> peer remote refid st t when poll reach delay offset jitter ============================================================================== +hub.ubc.ca 93.113.2.250 3 u 594 1024 377 1.252 1.110 1.520 *dns3.ubc.ca 192.53.103.108 2 u 92 1024 377 1.648 2.670 0.157

server2 ---------- ntpq> peer remote refid st t when poll reach delay offset jitter ============================================================================== +hub.ubc.ca 93.113.2.250 3 u 332 1024 377 0.706 3.487 0.900 *dns3.ubc.ca 192.53.103.108 2 u 325 1024 377 1.631 3.668 0.022

As far as I can tell the clocks appear to be in sync with each other, so hopefully this is not a cause of the replication issues I am having.

The problem is that the servers are now refusing to synchronize with each other (replication was working before) but not it does not. The log files on the servers are filled with entries like:

server1 ---------- Jul 29 13:48:54 ldapdev1 slapd[11989]: do_syncrep2: rid=002 LDAP_RES_SEARCH_RESULT Jul 29 13:48:54 ldapdev1 slapd[11989]: do_syncrep2: rid=002 LDAP_RES_SEARCH_RESULT (53) Server is unwilling to perform Jul 29 13:48:54 ldapdev1 slapd[11989]: do_syncrep2: rid=002 (53) Server is unwilling to perform Jul 29 13:48:57 ldapdev1 slapd[11989]: conn=1081 op=1 SRCH base="o=ubc.ca" scope=2 deref=0 filter="(objectClass=*)" Jul 29 13:48:57 ldapdev1 slapd[11989]: conn=1081 op=1 SRCH attr=* + Jul 29 13:48:57 ldapdev1 slapd[11989]: conn=1081 op=1 SEARCH RESULT tag=101 err=53 nentries=0 text=consumer state is newer than provider!

server2 ---------- Jul 29 13:50:52 ldapdev2 slapd[7996]: do_syncrep2: rid=001 LDAP_RES_SEARCH_RESULT Jul 29 13:50:52 ldapdev2 slapd[7996]: do_syncrep2: rid=001 LDAP_RES_SEARCH_RESULT (53) Server is unwilling to perform Jul 29 13:50:52 ldapdev2 slapd[7996]: do_syncrep2: rid=001 (53) Server is unwilling to perform Jul 29 13:50:55 ldapdev2 slapd[7996]: conn=1102 op=1 SRCH base="o=ubc.ca" scope=2 deref=0 filter="(objectClass=*)" Jul 29 13:50:55 ldapdev2 slapd[7996]: conn=1102 op=1 SRCH attr=* + Jul 29 13:50:55 ldapdev2 slapd[7996]: conn=1102 op=1 SEARCH RESULT tag=101 err=53 nentries=0 text=consumer state is newer than provider!

So it is looking like the ContextCSN cookies on both servers are out of sync. Digging further into this, I did a search for the ContextCSN values on both servers and got the following values:

server1 ---------- 20110729165747.697237Z#000000#001#000000;20110726161604.535176Z#000000#002#000000

server2 ---------- 20110728220449.050499Z#000000#001#000000;20110728223211.933995Z#000000#002#000000

So my question is: how does one get the server synchronization cookies back into sync and ensure that replication is restarted succesfully again? As of now, all I see is the log files filling up with messages as shown above and the sync cookies not being updated. Any help or pointers are appreciated. Thanks!

cheers,

Ven

Attachments:

attachment.htm (text/html — 11.4 KB)

Show replies by date

Chris Jacobs

1 Aug 1 Aug

8:33 a.m.

Apologies for top posting - blackberry.

Short term fix: Pick a server, take it offline (stop slapd). Clear it's database - be careful to not delete any db config files. Start it back up.

If this happens again, then you'll want to up logging, etc. There's plenty of info on how to trouble shoot openldap.

Note: I'm a sysadmin, not a systems engineer. It's possible the actual reason this broke is clear in your current logs, but not to me.

- chris

________________________________ From: openldap-technical-bounces@OpenLDAP.org openldap-technical-bounces@OpenLDAP.org To: openldap-technical@openldap.org openldap-technical@openldap.org Sent: Fri Jul 29 14:03:06 2011 Subject: syncrepl: consumer state is newer than provider

Hello,

I have 2 OpenLDAP servers with the following configuration:

-- OpenLDAP 2.4.26-Release running on Red Hat Enterprise 5.5 -- The two servers are setup in a mirrored multi-master configuration. Below is the relevant portion of the slapd.conf:

mirrormode TRUE overlay syncprov syncprov-checkpoint 100 10

The servers have their clocks synchronized using ntp. Below is the output of ntpq:

As far as I can tell the clocks appear to be in sync with each other, so hopefully this is not a cause of the replication issues I am having.

The problem is that the servers are now refusing to synchronize with each other (replication was working before) but not it does not. The log files on the servers are filled with entries like:

So it is looking like the ContextCSN cookies on both servers are out of sync. Digging further into this, I did a search for the ContextCSN values on both servers and got the following values:

server1 ---------- 20110729165747.697237Z#000000#001#000000;20110726161604.535176Z#000000#002#000000

server2 ---------- 20110728220449.050499Z#000000#001#000000;20110728223211.933995Z#000000#002#000000

cheers,

Ven

________________________________ This message is private and confidential. If you have received it in error, please notify the sender and remove it from your system.

Mahadevan, Venkatasubramanian

2 Aug 2 Aug

2:18 p.m.

Hi David,

Thanks much for your response. That's what I did but when I do that it seems to take forever to recover using syncrepl as it goes through all the entries in the databases comparing CSNs. So what I did was stop slapd and rebuild the database using slapadd with the -w option to preserve syncrepl information. After that, replication started working again, but it's a less than ideal way to recover from a replication failure. Perhaps the inherent nature of 2 master servers being updated leads to replication conflicts whereby the 2 servers get stuck in an infinite loop because their contextCSN values are out of sync?

cheers,

Ven

________________________________________ From: Chris Jacobs [Chris.Jacobs@apollogrp.edu] Sent: Monday, August 01, 2011 8:33 AM To: Mahadevan, Venkatasubramanian; 'openldap-technical@openldap.org' Subject: Re: syncrepl: consumer state is newer than provider

Apologies for top posting - blackberry.

Short term fix: Pick a server, take it offline (stop slapd). Clear it's database - be careful to not delete any db config files. Start it back up.

If this happens again, then you'll want to up logging, etc. There's plenty of info on how to trouble shoot openldap.

Note: I'm a sysadmin, not a systems engineer. It's possible the actual reason this broke is clear in your current logs, but not to me.

- chris

Hello,

I have 2 OpenLDAP servers with the following configuration:

-- OpenLDAP 2.4.26-Release running on Red Hat Enterprise 5.5 -- The two servers are setup in a mirrored multi-master configuration. Below is the relevant portion of the slapd.conf:

mirrormode TRUE overlay syncprov syncprov-checkpoint 100 10

The servers have their clocks synchronized using ntp. Below is the output of ntpq:

As far as I can tell the clocks appear to be in sync with each other, so hopefully this is not a cause of the replication issues I am having.

The problem is that the servers are now refusing to synchronize with each other (replication was working before) but not it does not. The log files on the servers are filled with entries like:

So it is looking like the ContextCSN cookies on both servers are out of sync. Digging further into this, I did a search for the ContextCSN values on both servers and got the following values:

server1 ---------- 20110729165747.697237Z#000000#001#000000;20110726161604.535176Z#000000#002#000000

server2 ---------- 20110728220449.050499Z#000000#001#000000;20110728223211.933995Z#000000#002#000000

cheers,

Ven

________________________________ This message is private and confidential. If you have received it in error, please notify the sender and remove it from your system.

Howard Chu

2:35 p.m.

Mahadevan, Venkatasubramanian wrote:

...

Hi David,

Thanks much for your response. That's what I did but when I do that it seems to take forever to recover using syncrepl as it goes through all the entries in the databases comparing CSNs. So what I did was stop slapd and rebuild the database using slapadd with the -w option to preserve syncrepl information. After that, replication started working again, but it's a less than ideal way to recover from a replication failure. Perhaps the inherent nature of 2 master servers being updated leads to replication conflicts whereby the 2 servers get stuck in an infinite loop because their contextCSN values are out of sync?

Next time try the slapd -c option.

...

cheers,

Ven

From: Chris Jacobs [Chris.Jacobs@apollogrp.edu] Sent: Monday, August 01, 2011 8:33 AM To: Mahadevan, Venkatasubramanian; 'openldap-technical@openldap.org' Subject: Re: syncrepl: consumer state is newer than provider

Apologies for top posting - blackberry.

Short term fix: Pick a server, take it offline (stop slapd). Clear it's database - be careful to not delete any db config files. Start it back up.

If this happens again, then you'll want to up logging, etc. There's plenty of info on how to trouble shoot openldap.

Note: I'm a sysadmin, not a systems engineer. It's possible the actual reason this broke is clear in your current logs, but not to me.

chris

Chris Jacobs, Systems Administrator, Technology Services Group Apollo Group | Apollo Marketing and Product Development?? |?? Aptimus, Inc. 2001 6th Ave?? |?? Suite 3200?? |?? Seattle, WA 98121 direct 206.839.8245?? |?? cell 206.601.3256?? |?? fax 206.839.8106 email chris.jacobs@apollogrp.edu

From: openldap-technical-bounces@OpenLDAP.orgopenldap-technical-bounces@OpenLDAP.org To: openldap-technical@openldap.orgopenldap-technical@openldap.org Sent: Fri Jul 29 14:03:06 2011 Subject: syncrepl: consumer state is newer than provider

Hello,

I have 2 OpenLDAP servers with the following configuration:

-- OpenLDAP 2.4.26-Release running on Red Hat Enterprise 5.5 -- The two servers are setup in a mirrored multi-master configuration. Below is the relevant portion of the slapd.conf:

server1

syncrepl rid=002 provider=ldaps://server2 type=refreshAndPersist retry="5 5 300 +" searchbase="o=ourdomain.ca" attrs="*,+" bindmethod=simple binddn="cn=Replication Manager,o=ubc.ca" credentials=something

mirrormode TRUE overlay syncprov syncprov-checkpoint 100 10

server2

syncrepl rid=001 provider=ldaps://server1 type=refreshAndPersist retry="5 5 300 +" searchbase="o=ourdomain.ca" attrs="*,+" bindmethod=simple binddn="cn=Replication Manager,o=ubc.ca" credentials=something

mirrormode TRUE overlay syncprov syncprov-checkpoint 100 10

The servers have their clocks synchronized using ntp. Below is the output of ntpq:

server1

ntpq> peer remote refid st t when poll reach delay offset jitter ============================================================================== +hub.ubc.ca 93.113.2.250 3 u 594 1024 377 1.252 1.110 1.520 *dns3.ubc.ca 192.53.103.108 2 u 92 1024 377 1.648 2.670 0.157

server2

ntpq> peer remote refid st t when poll reach delay offset jitter ============================================================================== +hub.ubc.ca 93.113.2.250 3 u 332 1024 377 0.706 3.487 0.900 *dns3.ubc.ca 192.53.103.108 2 u 325 1024 377 1.631 3.668 0.022

As far as I can tell the clocks appear to be in sync with each other, so hopefully this is not a cause of the replication issues I am having.

The problem is that the servers are now refusing to synchronize with each other (replication was working before) but not it does not. The log files on the servers are filled with entries like:

server1

Jul 29 13:48:54 ldapdev1 slapd[11989]: do_syncrep2: rid=002 LDAP_RES_SEARCH_RESULT Jul 29 13:48:54 ldapdev1 slapd[11989]: do_syncrep2: rid=002 LDAP_RES_SEARCH_RESULT (53) Server is unwilling to perform Jul 29 13:48:54 ldapdev1 slapd[11989]: do_syncrep2: rid=002 (53) Server is unwilling to perform Jul 29 13:48:57 ldapdev1 slapd[11989]: conn=1081 op=1 SRCH base="o=ubc.ca" scope=2 deref=0 filter="(objectClass=*)" Jul 29 13:48:57 ldapdev1 slapd[11989]: conn=1081 op=1 SRCH attr=* + Jul 29 13:48:57 ldapdev1 slapd[11989]: conn=1081 op=1 SEARCH RESULT tag=101 err=53 nentries=0 text=consumer state is newer than provider!

server2

Jul 29 13:50:52 ldapdev2 slapd[7996]: do_syncrep2: rid=001 LDAP_RES_SEARCH_RESULT Jul 29 13:50:52 ldapdev2 slapd[7996]: do_syncrep2: rid=001 LDAP_RES_SEARCH_RESULT (53) Server is unwilling to perform Jul 29 13:50:52 ldapdev2 slapd[7996]: do_syncrep2: rid=001 (53) Server is unwilling to perform Jul 29 13:50:55 ldapdev2 slapd[7996]: conn=1102 op=1 SRCH base="o=ubc.ca" scope=2 deref=0 filter="(objectClass=*)" Jul 29 13:50:55 ldapdev2 slapd[7996]: conn=1102 op=1 SRCH attr=* + Jul 29 13:50:55 ldapdev2 slapd[7996]: conn=1102 op=1 SEARCH RESULT tag=101 err=53 nentries=0 text=consumer state is newer than provider!

So it is looking like the ContextCSN cookies on both servers are out of sync. Digging further into this, I did a search for the ContextCSN values on both servers and got the following values:

server1

20110729165747.697237Z#000000#001#000000;20110726161604.535176Z#000000#002#000000

server2

20110728220449.050499Z#000000#001#000000;20110728223211.933995Z#000000#002#000000

So my question is: how does one get the server synchronization cookies back into sync and ensure that replication is restarted succesfully again? As of now, all I see is the log files filling up with messages as shown above and the sync cookies not being updated. Any help or pointers are appreciated. Thanks!

cheers,

Ven

This message is private and confidential. If you have received it in error, please notify the sender and remove it from your system.

-- -- Howard Chu CTO, Symas Corp. http://www.symas.com Director, Highland Sun http://highlandsun.com/hyc/ Chief Architect, OpenLDAP http://www.openldap.org/project/

Mahadevan, Venkatasubramanian

3:47 p.m.

Hi Howard,

I have tried the slapd -c option with a rid value, and it also tries to resync the entire directory when doing that while comparing CSNs. There is also a cid value which can be passed to the -c option, but I was unable to find an example of what to pass in there. Is it just a contextCSN value? Thanks.

cheers,

Ven

-----Original Message----- From: Howard Chu [mailto:hyc@symas.com] Sent: August-02-11 2:35 PM To: Mahadevan, Venkatasubramanian Cc: Chris Jacobs; 'openldap-technical@openldap.org' Subject: Re: syncrepl: consumer state is newer than provider

Mahadevan, Venkatasubramanian wrote:

...

Hi David,

Thanks much for your response. That's what I did but when I do that it seems to take forever to recover using syncrepl as it goes through all the entries in the databases comparing CSNs. So what I did was stop slapd and rebuild the database using slapadd with the -w option to preserve syncrepl information. After that, replication started working again, but it's a less than ideal way to recover from a replication failure. Perhaps the inherent nature of 2 master servers being updated leads to replication conflicts whereby the 2 servers get stuck in an infinite loop because their contextCSN values are out of sync?

Next time try the slapd -c option.

...

cheers,

Ven

From: Chris Jacobs [Chris.Jacobs@apollogrp.edu] Sent: Monday, August 01, 2011 8:33 AM To: Mahadevan, Venkatasubramanian; 'openldap-technical@openldap.org' Subject: Re: syncrepl: consumer state is newer than provider

Apologies for top posting - blackberry.

Short term fix: Pick a server, take it offline (stop slapd). Clear it's database - be careful to not delete any db config files. Start it back up.

If this happens again, then you'll want to up logging, etc. There's plenty of info on how to trouble shoot openldap.

Note: I'm a sysadmin, not a systems engineer. It's possible the actual reason this broke is clear in your current logs, but not to me.

chris

Chris Jacobs, Systems Administrator, Technology Services Group Apollo Group | Apollo Marketing and Product Development?? |?? Aptimus, Inc. 2001 6th Ave?? |?? Suite 3200?? |?? Seattle, WA 98121 direct 206.839.8245?? |?? cell 206.601.3256?? |?? fax 206.839.8106 email chris.jacobs@apollogrp.edu

From: openldap-technical-bounces@OpenLDAP.org<openldap-technical-bounces@Ope nLDAP.org> To: openldap-technical@openldap.orgopenldap-technical@openldap.org Sent: Fri Jul 29 14:03:06 2011 Subject: syncrepl: consumer state is newer than provider

Hello,

I have 2 OpenLDAP servers with the following configuration:

-- OpenLDAP 2.4.26-Release running on Red Hat Enterprise 5.5 -- The two servers are setup in a mirrored multi-master configuration. Below is the relevant portion of the slapd.conf:

server1

syncrepl rid=002 provider=ldaps://server2 type=refreshAndPersist retry="5 5 300 +" searchbase="o=ourdomain.ca" attrs="*,+" bindmethod=simple binddn="cn=Replication Manager,o=ubc.ca" credentials=something

mirrormode TRUE overlay syncprov syncprov-checkpoint 100 10

server2

syncrepl rid=001 provider=ldaps://server1 type=refreshAndPersist retry="5 5 300 +" searchbase="o=ourdomain.ca" attrs="*,+" bindmethod=simple binddn="cn=Replication Manager,o=ubc.ca" credentials=something

mirrormode TRUE overlay syncprov syncprov-checkpoint 100 10

The servers have their clocks synchronized using ntp. Below is the output of ntpq:

server1

ntpq> peer remote refid st t when poll reach delay offset jitter ====================================================================== ======== +hub.ubc.ca 93.113.2.250 3 u 594 1024 377 1.252 1.110 1.520 *dns3.ubc.ca 192.53.103.108 2 u 92 1024 377 1.648 2.670 0.157

server2

ntpq> peer remote refid st t when poll reach delay offset jitter ====================================================================== ======== +hub.ubc.ca 93.113.2.250 3 u 332 1024 377 0.706 3.487 0.900 *dns3.ubc.ca 192.53.103.108 2 u 325 1024 377 1.631 3.668 0.022

As far as I can tell the clocks appear to be in sync with each other, so hopefully this is not a cause of the replication issues I am having.

The problem is that the servers are now refusing to synchronize with each other (replication was working before) but not it does not. The log files on the servers are filled with entries like:

server1

Jul 29 13:48:54 ldapdev1 slapd[11989]: do_syncrep2: rid=002 LDAP_RES_SEARCH_RESULT Jul 29 13:48:54 ldapdev1 slapd[11989]: do_syncrep2: rid=002 LDAP_RES_SEARCH_RESULT (53) Server is unwilling to perform Jul 29 13:48:54 ldapdev1 slapd[11989]: do_syncrep2: rid=002 (53) Server is unwilling to perform Jul 29 13:48:57 ldapdev1 slapd[11989]: conn=1081 op=1 SRCH base="o=ubc.ca" scope=2 deref=0 filter="(objectClass=*)" Jul 29 13:48:57 ldapdev1 slapd[11989]: conn=1081 op=1 SRCH attr=* + Jul 29 13:48:57 ldapdev1 slapd[11989]: conn=1081 op=1 SEARCH RESULT tag=101 err=53 nentries=0 text=consumer state is newer than provider!

server2

Jul 29 13:50:52 ldapdev2 slapd[7996]: do_syncrep2: rid=001 LDAP_RES_SEARCH_RESULT Jul 29 13:50:52 ldapdev2 slapd[7996]: do_syncrep2: rid=001 LDAP_RES_SEARCH_RESULT (53) Server is unwilling to perform Jul 29 13:50:52 ldapdev2 slapd[7996]: do_syncrep2: rid=001 (53) Server is unwilling to perform Jul 29 13:50:55 ldapdev2 slapd[7996]: conn=1102 op=1 SRCH base="o=ubc.ca" scope=2 deref=0 filter="(objectClass=*)" Jul 29 13:50:55 ldapdev2 slapd[7996]: conn=1102 op=1 SRCH attr=* + Jul 29 13:50:55 ldapdev2 slapd[7996]: conn=1102 op=1 SEARCH RESULT tag=101 err=53 nentries=0 text=consumer state is newer than provider!

So it is looking like the ContextCSN cookies on both servers are out of sync. Digging further into this, I did a search for the ContextCSN values on both servers and got the following values:

server1

20110729165747.697237Z#000000#001#000000;20110726161604.535176Z#000000 #002#000000

server2

20110728220449.050499Z#000000#001#000000;20110728223211.933995Z#000000 #002#000000

So my question is: how does one get the server synchronization cookies back into sync and ensure that replication is restarted succesfully again? As of now, all I see is the log files filling up with messages as shown above and the sync cookies not being updated. Any help or pointers are appreciated. Thanks!

cheers,

Ven

This message is private and confidential. If you have received it in error, please notify the sender and remove it from your system.

-- -- Howard Chu CTO, Symas Corp. http://www.symas.com Director, Highland Sun http://highlandsun.com/hyc/ Chief Architect, OpenLDAP http://www.openldap.org/project/

Quanah Gibson-Mount

4:04 p.m.

--On Tuesday, August 02, 2011 3:47 PM -0700 "Mahadevan, Venkatasubramanian" Venkatasubramanian.Mahadevan@ubc.ca wrote:

...

Hi Howard,

I have tried the slapd -c option with a rid value, and it also tries to resync the entire directory when doing that while comparing CSNs. There is also a cid value which can be passed to the -c option, but I was unable to find an example of what to pass in there. Is it just a contextCSN value? Thanks.

Please don't top post. The values you can pass "-c" are clearly documented in the slapd(8C) man page. If you are using MMR and failed to provide a sid=X value as documented in the man page, then you clearly did things wrong.

--Quanah

Quanah Gibson-Mount Sr. Member of Technical Staff Zimbra, Inc A Division of VMware, Inc. -------------------- Zimbra :: the leader in open source messaging and collaboration

5214

Age (days ago)

5218

Last active (days ago)

openldap-technical@openldap.org

5 comments

4 participants

tags (0)

participants (4)

Chris Jacobs
Howard Chu
Mahadevan, Venkatasubramanian
Quanah Gibson-Mount