So it seems easy to do this monitoring via some external agent/program. Can I do something (short of writing an overlay) to get this information with a ldap query? i.e. some query which would give me the difference between the current contextCSN of the machine I'm talking to and the master server. AFAICT, the existing overlays won't let me create this kind of synthesized value.
Alternatively, I think I'd be happy with a query to tell me if the server thinks it is having trouble talking to the master. Thanks for everything. Roy
-----Original Message----- From: Gavin Henry [mailto:ghenry@suretecsystems.com] Sent: Wednesday, March 05, 2008 3:28 AM To: Aaron Richton Cc: Marantz, Roy; openldap-software@openldap.org Subject: RE: Testing the state of replicates
<quote who="Aaron Richton">
[Gavin says]
Dig the main source. servers/slapd/syncrepl.c and servers/slapd/overlays/syncprov.c
Hmm, wrong source files. Try libraries/liblutil/csn.c, which sayeth:
- These routines are (loosly) based upon
draft-ietf-ldup-model-03.txt,
- A WORK IN PROGRESS. The format will likely change.
- The format of a CSN string is: yyyymmddhhmmssz#s#r#c
- where s is a counter of operations within a timeslice, r is
- the replica id (normally zero), and c is a counter of
- modifications within this operation. s, r, and c are
- represented in hex and zero padded to lengths of 6, 3, and
- 6, respectively. (In previous implementations r was only 2
digits.)
Ah, many thanks.
We use http://www.openldap.org/lists/openldap-software/200602/msg00158.html, maybe with a small mod or two (I forget), to check that contextCSN
isn't
wedged. This only works when the syncrepl thread is completely borked.
A
better check would be something along the lines of the Net::LDAP
ldifdiff
to make sure that nothing's different. Of course this has race
condition
issues (not that we make writes all that often, but on paper at
least). If
anybody has something like that as a monitoring plugin, you'd erase
one
line off my perpetual todo list...
;-) Plugin for what?
(Yes, that would be of great interest to me. ~93% of syncrepl bugs
we've
seen involve very very very slight errors that only result in an entry
or
two being wrong. contextCSN being wrong...we pretty much only see that
in
the field when tcp keepalives fail to indicate the need for a reconnection.)
So the entryCSN would be wrong?