Hello,

 

I have 2 OpenLDAP servers with the following configuration:

 

-- OpenLDAP 2.4.26-Release running on Red Hat Enterprise 5.5

-- The two servers are setup in a mirrored multi-master configuration. Below is the

relevant portion of the slapd.conf:

 

 

server1

----------

syncrepl rid=002

provider=ldaps://server2

type=refreshAndPersist

retry="5 5 300 +"

searchbase="o=ourdomain.ca"

attrs="*,+"

bindmethod=simple

binddn="cn=Replication Manager,o=ubc.ca"

credentials=something

 

mirrormode TRUE

overlay syncprov

syncprov-checkpoint 100 10

 

server2

----------

syncrepl rid=001

provider=ldaps://server1

type=refreshAndPersist

retry="5 5 300 +"

searchbase="o=ourdomain.ca"

attrs="*,+"

bindmethod=simple

binddn="cn=Replication Manager,o=ubc.ca"

credentials=something

 

mirrormode TRUE

overlay syncprov

syncprov-checkpoint 100 10

 

The servers have their clocks synchronized using ntp. Below is the output of ntpq:

 

server1

----------

ntpq> peer

     remote           refid      st t when poll reach   delay   offset  jitter

==============================================================================

+hub.ubc.ca      93.113.2.250     3 u  594 1024  377    1.252    1.110   1.520

*dns3.ubc.ca     192.53.103.108   2 u   92 1024  377    1.648    2.670   0.157

 

server2

----------

ntpq> peer

     remote           refid      st t when poll reach   delay   offset  jitter

==============================================================================

+hub.ubc.ca      93.113.2.250     3 u  332 1024  377    0.706    3.487   0.900

*dns3.ubc.ca     192.53.103.108   2 u  325 1024  377    1.631    3.668   0.022

 

 

As far as I can tell the clocks appear to be in sync with each other, so hopefully this is not a cause of

the replication issues I am having.

 

The problem is that the servers are now refusing to synchronize with each other (replication was working

before) but not it does not. The log files on the servers are filled with entries like:

 

server1

----------

Jul 29 13:48:54 ldapdev1 slapd[11989]: do_syncrep2: rid=002 LDAP_RES_SEARCH_RESULT

Jul 29 13:48:54 ldapdev1 slapd[11989]: do_syncrep2: rid=002 LDAP_RES_SEARCH_RESULT (53) Server is unwilling to perform

Jul 29 13:48:54 ldapdev1 slapd[11989]: do_syncrep2: rid=002 (53) Server is unwilling to perform

Jul 29 13:48:57 ldapdev1 slapd[11989]: conn=1081 op=1 SRCH base="o=ubc.ca" scope=2 deref=0 filter="(objectClass=*)"

Jul 29 13:48:57 ldapdev1 slapd[11989]: conn=1081 op=1 SRCH attr=* +

Jul 29 13:48:57 ldapdev1 slapd[11989]: conn=1081 op=1 SEARCH RESULT tag=101 err=53 nentries=0 text=consumer state is newer than provider!

 

server2

----------

Jul 29 13:50:52 ldapdev2 slapd[7996]: do_syncrep2: rid=001 LDAP_RES_SEARCH_RESULT

Jul 29 13:50:52 ldapdev2 slapd[7996]: do_syncrep2: rid=001 LDAP_RES_SEARCH_RESULT (53) Server is unwilling to perform

Jul 29 13:50:52 ldapdev2 slapd[7996]: do_syncrep2: rid=001 (53) Server is unwilling to perform

Jul 29 13:50:55 ldapdev2 slapd[7996]: conn=1102 op=1 SRCH base="o=ubc.ca" scope=2 deref=0 filter="(objectClass=*)"

Jul 29 13:50:55 ldapdev2 slapd[7996]: conn=1102 op=1 SRCH attr=* +

Jul 29 13:50:55 ldapdev2 slapd[7996]: conn=1102 op=1 SEARCH RESULT tag=101 err=53 nentries=0 text=consumer state is newer than provider!

 

 

So it is looking like the ContextCSN cookies on both servers are out of sync. Digging further into this, I did a search for the ContextCSN values on  both servers and got the following values:

 

server1

----------

20110729165747.697237Z#000000#001#000000;20110726161604.535176Z#000000#002#000000

 

server2

----------

20110728220449.050499Z#000000#001#000000;20110728223211.933995Z#000000#002#000000

 

 

So my question is: how does one get the server synchronization cookies back into sync and ensure that replication is restarted succesfully again?

As of now, all I see is the log files filling up with messages as shown above and the sync cookies not being updated. Any help or pointers are appreciated. Thanks!

 

cheers,

 

Ven