We use slurpd, and I have gone to some pains to make our home-grown service monitor software check the replication files, on the master hosts, so we have timely notification when replication has stalled.
How do sites that use syncrepl do this?
For example, my new replica is failing right away. I can see it in the master syslog: a bind, a search for * +, then a search result with err=3. On the replica side, however - not a peep.
After a little tinkering, I can get "do_syncrep2 result: rid=101 Timed out", but that requires changes to the code. This exercise convinced me that the syncrepl engine isn't supposed to syslog success or failure of its queries, presumably for some good reason and there must be a better way to diagnose problems.
The monitoring objective is to verify that the server is either synched, or is making satisfactory progress in that direction. Is there a good way to monitor the state of that syncrepl thread?
Thanks, Donn Cave, donn@u.washington.edu
--On Thursday, April 19, 2007 3:47 PM -0700 Donn Cave donn@u.washington.edu wrote:
We use slurpd, and I have gone to some pains to make our home-grown service monitor software check the replication files, on the master hosts, so we have timely notification when replication has stalled.
How do sites that use syncrepl do this?
Buchan Milne made a nice plugin to monitor replication status for syncrepl as a hobbit plugin. I bastardized his nice script to make it work for me with nagios.
For example, my new replica is failing right away. I can see it in the master syslog: a bind, a search for * +, then a search result with err=3. On the replica side, however - not a peep.
After a little tinkering, I can get "do_syncrep2 result: rid=101 Timed out", but that requires changes to the code. This exercise convinced me that the syncrepl engine isn't supposed to syslog success or failure of its queries, presumably for some good reason and there must be a better way to diagnose problems.
The monitoring objective is to verify that the server is either synched, or is making satisfactory progress in that direction. Is there a good way to monitor the state of that syncrepl thread?
Yes, it is quite simple. Once merely looks at the contextCSN values at the root of the database on both the master and slave. If they match, things are in sync. If they don't, they aren't.
--Quanah
-- Quanah Gibson-Mount Senior Systems Software Developer ITS/Shared Application Services Stanford University GnuPG Public Key: http://www.stanford.edu/~quanah/pgp.html
openldap-software@openldap.org