Hi Howard,
I have tried the slapd -c option with a rid value, and it
also tries to resync the entire directory when doing that
while comparing CSNs. There is also a cid value which can
be passed to the -c option, but I was unable to find an
example of what to pass in there. Is it just a contextCSN value?
Thanks.
cheers,
Ven
-----Original Message-----
From: Howard Chu [mailto:hyc@symas.com]
Sent: August-02-11 2:35 PM
To: Mahadevan, Venkatasubramanian
Cc: Chris Jacobs; 'openldap-technical(a)openldap.org'
Subject: Re: syncrepl: consumer state is newer than provider
Mahadevan, Venkatasubramanian wrote:
Hi David,
Thanks much for your response.
That's what I did but when I do that it seems to take forever to
recover using syncrepl as it goes through all the entries in the
databases comparing CSNs. So what I did was stop slapd and rebuild the
database using slapadd with the -w option to preserve syncrepl
information. After that, replication started working again, but it's a
less than ideal way to recover from a replication failure. Perhaps the
inherent nature of 2 master servers being updated leads to replication
conflicts whereby the 2 servers get stuck in an infinite loop because their contextCSN
values are out of sync?
Next time try the slapd -c option.
cheers,
Ven
________________________________________
From: Chris Jacobs [Chris.Jacobs(a)apollogrp.edu]
Sent: Monday, August 01, 2011 8:33 AM
To: Mahadevan, Venkatasubramanian; 'openldap-technical(a)openldap.org'
Subject: Re: syncrepl: consumer state is newer than provider
Apologies for top posting - blackberry.
Short term fix:
Pick a server, take it offline (stop slapd).
Clear it's database - be careful to not delete any db config files.
Start it back up.
If this happens again, then you'll want to up logging, etc. There's plenty of
info on how to trouble shoot openldap.
Note: I'm a sysadmin, not a systems engineer. It's possible the actual reason
this broke is clear in your current logs, but not to me.
- chris
Chris Jacobs, Systems Administrator, Technology Services Group Apollo
Group | Apollo Marketing and Product Development?? |?? Aptimus, Inc.
2001 6th Ave?? |?? Suite 3200?? |?? Seattle, WA 98121 direct
206.839.8245?? |?? cell 206.601.3256?? |?? fax 206.839.8106 email
chris.jacobs(a)apollogrp.edu
________________________________
From:
openldap-technical-bounces@OpenLDAP.org<openldap-technical-bounces@Ope
nLDAP.org>
To: openldap-technical@openldap.org<openldap-technical@openldap.org>
Sent: Fri Jul 29 14:03:06 2011
Subject: syncrepl: consumer state is newer than provider
Hello,
I have 2 OpenLDAP servers with the following configuration:
-- OpenLDAP 2.4.26-Release running on Red Hat Enterprise 5.5
-- The two servers are setup in a mirrored multi-master configuration.
Below is the relevant portion of the slapd.conf:
server1
----------
syncrepl rid=002
provider=ldaps://server2
type=refreshAndPersist
retry="5 5 300 +"
searchbase="o=ourdomain.ca"
attrs="*,+"
bindmethod=simple
binddn="cn=Replication Manager,o=ubc.ca"
credentials=something
mirrormode TRUE
overlay syncprov
syncprov-checkpoint 100 10
server2
----------
syncrepl rid=001
provider=ldaps://server1
type=refreshAndPersist
retry="5 5 300 +"
searchbase="o=ourdomain.ca"
attrs="*,+"
bindmethod=simple
binddn="cn=Replication Manager,o=ubc.ca"
credentials=something
mirrormode TRUE
overlay syncprov
syncprov-checkpoint 100 10
The servers have their clocks synchronized using ntp. Below is the output of ntpq:
server1
----------
ntpq> peer
remote refid st t when poll reach delay offset jitter
======================================================================
========
+hub.ubc.ca 93.113.2.250 3 u 594 1024 377 1.252 1.110 1.520
*dns3.ubc.ca 192.53.103.108 2 u 92 1024 377 1.648 2.670 0.157
server2
----------
ntpq> peer
remote refid st t when poll reach delay offset jitter
======================================================================
========
+hub.ubc.ca 93.113.2.250 3 u 332 1024 377 0.706 3.487 0.900
*dns3.ubc.ca 192.53.103.108 2 u 325 1024 377 1.631 3.668 0.022
As far as I can tell the clocks appear to be in sync with each other,
so hopefully this is not a cause of the replication issues I am having.
The problem is that the servers are now refusing to synchronize with
each other (replication was working
before) but not it does not. The log files on the servers are filled with entries like:
server1
----------
Jul 29 13:48:54 ldapdev1 slapd[11989]: do_syncrep2: rid=002
LDAP_RES_SEARCH_RESULT Jul 29 13:48:54 ldapdev1 slapd[11989]:
do_syncrep2: rid=002 LDAP_RES_SEARCH_RESULT (53) Server is unwilling
to perform Jul 29 13:48:54 ldapdev1 slapd[11989]: do_syncrep2: rid=002 (53) Server is
unwilling to perform Jul 29 13:48:57 ldapdev1 slapd[11989]: conn=1081 op=1 SRCH
base="o=ubc.ca" scope=2 deref=0 filter="(objectClass=*)"
Jul 29 13:48:57 ldapdev1 slapd[11989]: conn=1081 op=1 SRCH attr=* +
Jul 29 13:48:57 ldapdev1 slapd[11989]: conn=1081 op=1 SEARCH RESULT tag=101 err=53
nentries=0 text=consumer state is newer than provider!
server2
----------
Jul 29 13:50:52 ldapdev2 slapd[7996]: do_syncrep2: rid=001
LDAP_RES_SEARCH_RESULT Jul 29 13:50:52 ldapdev2 slapd[7996]:
do_syncrep2: rid=001 LDAP_RES_SEARCH_RESULT (53) Server is unwilling
to perform Jul 29 13:50:52 ldapdev2 slapd[7996]: do_syncrep2: rid=001 (53) Server is
unwilling to perform Jul 29 13:50:55 ldapdev2 slapd[7996]: conn=1102 op=1 SRCH
base="o=ubc.ca" scope=2 deref=0 filter="(objectClass=*)"
Jul 29 13:50:55 ldapdev2 slapd[7996]: conn=1102 op=1 SRCH attr=* + Jul
29 13:50:55 ldapdev2 slapd[7996]: conn=1102 op=1 SEARCH RESULT tag=101 err=53 nentries=0
text=consumer state is newer than provider!
So it is looking like the ContextCSN cookies on both servers are out of sync. Digging
further into this, I did a search for the ContextCSN values on both servers and got the
following values:
server1
----------
20110729165747.697237Z#000000#001#000000;20110726161604.535176Z#000000
#002#000000
server2
----------
20110728220449.050499Z#000000#001#000000;20110728223211.933995Z#000000
#002#000000
So my question is: how does one get the server synchronization cookies back into sync and
ensure that replication is restarted succesfully again?
As of now, all I see is the log files filling up with messages as shown above and the
sync cookies not being updated. Any help or pointers are appreciated. Thanks!
cheers,
Ven
________________________________
This message is private and confidential. If you have received it in error, please notify
the sender and remove it from your system.
--
-- Howard Chu
CTO, Symas Corp.
http://www.symas.com
Director, Highland Sun
http://highlandsun.com/hyc/
Chief Architect, OpenLDAP
http://www.openldap.org/project/