On Wed, Dec 25, 2024 at 04:24:49PM -0000, Dave wrote:
Hello Everyone,
Hope you are enjoying the day.
Was curious what people are doing to monitor their MMR clusters being out of sync.
At least the implementation we found and have been trying is using telegraf and the ldap_org plugin to gather all objects on each ldap master and compare their count.
How is the community doing it? Any better way to go about this?
Hi, I've seen two main approaches, sometimes combined: 1. Tracking contextCSN/cookie: a) read out the contextCSN from the DB's top-level entry (poll) b) have the server push any cookie changes on-line (push), you also get to discover the provider's serverID this way 2. Read the olmMDBEntries from cn=monitor and make sure those stay in sync
The former gives you more information and is my go-to, but its use in monitoring can be confusing: each serverID CSN has to be compared independently, you cannot do straight time arithmetic for alerting, ...
Some of that is abstracted away by syncmonitor[0] which should be easy to adapt for most monitoring solutions and can even do real-time monitoring + alerting. It is under active development, most recently in the textual branch to expose a TUI frontend and refactor the library to track the cookies on a per-SID basis for real-time replication delay measurement.
Both of them can and often are run in tandem - entry count is useful at catching misconfiguration where ACLs do not give the replication identity the intended permissions (or only for accesslog but not main DB). In deltasync you also have to monitor accesslog DB *never* runs out of space, the desyncs that result from such a failure are not recoverable save by identifying a canonical provider and using it to reseed the cluster manually.
[0]. https://git.openldap.org/openldap/syncmonitor
Regards,