On Mon, Jan 06, 2025 at 11:41:26AM -0500, Dave Macias wrote:
Thank you Ondřej for the reply!
- Read the olmMDBEntries from cn=monitor and make sure those stay in sync
The olmMDBEntries help me out a lot! Thank you for this pointer. Since we already gather cn=monitor metrics, the data was already there so we just need to adjust our alerting for that.
The former gives you more information and is my go-to, but its use in monitoring can be confusing: each serverID CSN has to be compared independently, you cannot do straight time arithmetic for alerting, ...
Any tool recommendations for this?
In the syncmonitor repository, look at the synccheck tool and you might be able to use its output as input for whatever monitoring/alerting system you need. Or use the library if you want something that keeps running and feeding back status changes.
Let me know if there's any more questions when integrating it or you encounter bugs etc., happy to help get this used more widely.
Some of that is abstracted away by syncmonitor[0] which should be easy to adapt for most monitoring solutions and can even do real-time monitoring + alerting. It is under active development, most recently in the textual branch to expose a TUI frontend and refactor the library to track the cookies on a per-SID basis for real-time replication delay measurement.
What is not "abstracted away by syncmonitor" ?
Not sure what the question is but the thing syncmonitor library doesn't give you is real-time replication delay measurement. The TUI version might eventually do some of this.
To do delay measurement you need a full history of all server's contextCSNs over time and measure the lag from that (= when was the last time the originating server's CSN was lower or equal to mine?). So the only thing you can get is a point in time replication delay measurement every time you run it (capped to e.g. 30s when it gives up and declares the server behind because monitoring solutions might declare the script itself timed out).
Regards,