Hi!
I'm running a NRPE monitor that monitors openLDAP. To compute delta values, the NRPE plugin stores the "dn: cn=Uptime,cn=Time,cn=Monitor" reported by openLDAP to compare it with the current value (to build the time difference). If the time difference is negative, the plugin reports a "server restart" event.
So far, so good...
However, very rarely (once every few weeks) a false server restart is reported. Viewing the historic logs, I can only assume that openLDAP reports a bad uptime value now and then. Could this be?
Example plugin logs:
2016/04/28 10:25:14 LDAP: Operations Rate OK HARD 1 OK: [Lock.Delay=0.246], [3 searches with 12 entries in 0.057s], 2340577, 0.10084, 0.07563, 0.35294, 0, 0, 0, 0, 0, 0, 0.08403, 0.61345
2016/04/28 10:23:14 LDAP: Operations Rate OK HARD 1 OK: [Lock.Delay=0.288], [3 searches with 12 entries in 0.052s], 2340458, 0.09917, 0.08264, 0.32231, 0, 0, 0, 0, 0, 0, 0.08264, 0.58678
2016/04/28 10:21:14 LDAP: Operations Rate UNKNOWN HARD 1 UNKNOWN: [Lock.Delay=0.221], [3 searches with 12 entries in 0.052s], 2340337, [Operations.Bind.i: restart detected (2340337 - 2340341 == -4)], 284553, [Operations.Unbind.i: restart detected (2340337 - 2340341 == -4)], 205927, [Operations.Search.i: restart detected (2340337 - 2340341 == -4)], 985106, [Operations.Compare.i: restart detected (2340337 - 2340341 == -4)], 0, [Operations.Modify.i: restart detected (2340337 - 2340341 == -4)], 0, [Operations.Modrdn.i: restart detected (2340337 - 2340341 == -4)], 0, [Operations.Add.i: restart detected (2340337 - 2340341 == -4)], 0, [Operations.Delete.i: restart detected (2340337 - 2340341 == -4)], 0, [Operations.Abandon.i: restart detected (2340337 - 2340341 == -4)], 4, [Operations.Extended.i: restart detected (2340337 - 2340341 == -4)], 245199, [Operations.i: restart detected (2340337 - 2340341 == -4)], 1720790
2016/04/28 10:19:14 LDAP: Operations Rate OK HARD 1 OK: [Lock.Delay=0.501], [3 searches with 12 entries in 0.060s], 2340341, 0.13333, 0.09167, 0.65000, 0, 0, 0, 0, 0, 0, 0.11667, 0.99167
(The first number after the messages in square brackets is the uptime, then come the operations rate for several LDAP operations)
At 10:19:14 the uptime was 2340341, and at 10:25:14 the uptime was 2340577. That could be OK, but at 10:21:14 the uptime was reported as 2340337 which is less than reported at 10:19:14.
Has there ever been such an issue with uptime?
Regards,
Ulrich