On 18/08/2010 16:28, Elmar Marschke wrote:
On 18.08.2010 15:10, Oliver Liebel wrote:
the mmr config was tested many times and should work as expected, but your clockskew below may be to great for mmr to work as intended
are your servers are physical/paravirt vms or full emulated vms? if they are full emulated (eg vmware server), you will always run into major clock skews.
check out / set some of your ntp settings, e.g.
- tinker panic 0
- server <....> minpoll <value> maxpoll <value>
yes, the time thing looks suspect to me also. I will try to get that better working; thanks for your hints.
But now, here's some log stuff; as announced ;) :
logfile excerpts from MASTER (hostname: ldapmaster) and STANDBY (hostname: ldapslave). I tried to make it as clearly arranged as possible by inserting self-explaining (hopefully ;)) tags into the logfiles. Here's what i did; and where:
- started both machines in a well working state (replication fine in
every detail; freshly converted slapd.conf, freshly replicated). Both machines up.
- insert tag "===_BEGIN_CHANGES_WHILE_BOTH_UP_===" into logfiles on both
machines (script based)
- MASTER: deleted user hcallahan (using jxplorer)
- MASTER: added user mmouse (using ldapadd)
- MASTER: changed attribute telephone number user vcorleone to
"0800-mafia" (using jxplorer)
MASTER: insert tag "===_END_CHANGES_WHILE_BOTH_UP_==="
insert tag "===_END_CHANGES_WHILE_BOTH_UP_===" into logfiles on both
machines
STANDBY: rcldap stop
insert tag "===_BEGIN_CHANGES_WHILE_STANDBY_DOWN_==="
MASTER: delete user jmcclane
MASTER: add user ggoofy
MASTER: change telephone number user ckent to "0800-superman"
insert tag "===_END_CHANGES_WHILE_STANDBY_DOWN_==="
insert tag "===_BEGIN_STANDBY_COMING_BACK_==="
STANDBY: rcldap start
insert tag "===_END_STANDBY_COMING_BACK_==="
The tags are in MASTER's and STANDBY's logfile, so one can compare what's going on at the same point in time.
The result on STANDBY at the end again is as i wrote before: STANDBY:
- every operation while it was up was successfully replicated
- the operations while it was down are partially replicated after coming
up again: -- deletion user jmcclane: OK -- adding user ggoofy: obviously did not happen; not available on STANDBY -- changing phone number ckent: did not happen; on STANDBY it's the same as before (means it differs now from the value it has on MASTER).
Here's the logfile of MASTER: ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++ ===_BEGIN_CHANGES_WHILE_BOTH_UP_=== Aug 18 15:30:04 ldapmaster slapd[8017]: slap_queue_csn: queing 0x7f00f317b580 20100818133004.663851Z#000000#000#000000 Aug 18 15:30:04 ldapmaster slapd[8017]: slap_graduate_commit_csn: removing 0x7f00fb4ee230 20100818133004.663851Z#000000#000#000000 Aug 18 15:30:04 ldapmaster slapd[8017]: syncprov_sendresp: cookie=rid=001,csn=20100818133004.663851Z#000000#000#000000 Aug 18 15:30:04 ldapmaster slapd[8017]: syncprov_sendresp: cookie=rid=001,csn=20100818133004.663851Z#000000#000#000000 Aug 18 15:30:04 ldapmaster slapd[8017]: do_syncrep2: rid=001 cookie=rid=001,csn=20100818133004.663851Z#000000#000#000000 Aug 18 15:30:04 ldapmaster slapd[8017]: do_syncrep2: rid=001 CSN too old, ignoring 20100818133004.663851Z#000000#000#000000 Aug 18 15:30:04 ldapmaster slapd[8017]: do_syncrep2: rid=002 cookie=rid=002,csn=20100818133004.663851Z#000000#000#000000 Aug 18 15:30:04 ldapmaster slapd[8017]: do_syncrep2: rid=002 CSN too old, ignoring 20100818133004.663851Z#000000#000#000000 Aug 18 15:30:25 ldapmaster slapd[8017]: slap_queue_csn: queing 0x7f00f397d060 20100818133025.570081Z#000000#000#000000 Aug 18 15:30:25 ldapmaster slapd[8017]: syncprov_sendresp: cookie=rid=001,csn=20100818133025.570081Z#000000#000#000000 Aug 18 15:30:25 ldapmaster slapd[8017]: slap_graduate_commit_csn: removing 0x7f00fb4ed9d0 20100818133025.570081Z#000000#000#000000 Aug 18 15:30:25 ldapmaster slapd[8017]: syncprov_sendresp: cookie=rid=001,csn=20100818133025.570081Z#000000#000#000000 Aug 18 15:30:25 ldapmaster slapd[8017]: do_syncrep2: rid=001 cookie=rid=001,csn=20100818133025.570081Z#000000#000#000000 Aug 18 15:30:25 ldapmaster slapd[8017]: do_syncrep2: rid=001 CSN too old, ignoring 20100818133025.570081Z#000000#000#000000 Aug 18 15:30:25 ldapmaster slapd[8017]: do_syncrep2: rid=002 cookie=rid=002,csn=20100818133025.570081Z#000000#000#000000 Aug 18 15:30:25 ldapmaster slapd[8017]: do_syncrep2: rid=002 CSN too old, ignoring 20100818133025.570081Z#000000#000#000000 Aug 18 15:30:45 ldapmaster slapd[8017]: slap_queue_csn: queing 0x7f00f497e0f0 20100818133045.686887Z#000000#000#000000 Aug 18 15:30:45 ldapmaster slapd[8017]: syncprov_sendresp: cookie=rid=001,csn=20100818133045.686887Z#000000#000#000000 Aug 18 15:30:45 ldapmaster slapd[8017]: slap_graduate_commit_csn: removing 0x7f00ec114bd0 20100818133045.686887Z#000000#000#000000 Aug 18 15:30:45 ldapmaster slapd[8017]: syncprov_sendresp: cookie=rid=001,csn=20100818133045.686887Z#000000#000#000000 Aug 18 15:30:45 ldapmaster slapd[8017]: do_syncrep2: rid=001 cookie=rid=001,csn=20100818133045.686887Z#000000#000#000000 Aug 18 15:30:45 ldapmaster slapd[8017]: do_syncrep2: rid=001 CSN too old, ignoring 20100818133045.686887Z#000000#000#000000 Aug 18 15:30:45 ldapmaster slapd[8017]: do_syncrep2: rid=002 cookie=rid=002,csn=20100818133045.686887Z#000000#000#000000 Aug 18 15:30:45 ldapmaster slapd[8017]: do_syncrep2: rid=002 CSN too old, ignoring 20100818133045.686887Z#000000#000#000000 ===_END_CHANGES_WHILE_BOTH_UP_===
Clearly, slapd has not taken into account the SID parameters. You can see this in these CSN, the 3rd fields (the 3 digits before the last #) is 000, which is the default SID.
The reason for this is most likely that slapd cannot recognize that "ldapmaster.local.site" refers to the machine it's running on. I can see a few options to solve this: - configure /etc/hosts so that host --fqdn returns this name - run slapd with -h ldap://ldapmaster.local.site
After changing it, I recommend a full re-sync of your backup instance.
This should help!
Jonathan