On 3/22/19 7:54 AM, Angel L. Mateo wrote:
El 21/3/19 a las 20:26, Michael Ströder escribió:
On 3/21/19 8:22 AM, Ángel L. Mateo wrote:
Now the server with problems works without problems for days, but then it start delaying syncs.
How do you detect this?
Checking contextCSN attribute of all ldap servers. I get something like this:
contextCSN: 20190322064915.077600Z#000000#01f#000000 contextCSN: 20190322065006.637604Z#000000#020#000000 contextCSN: 20190322065002.859879Z#000000#021#000000 contextCSN: 20190322065000.303715Z#000000#022#000000 contextCSN: 20190301102558.398349Z#000000#027#000000 contextCSN: 20190314080533.305657Z#000000#029#000000
There is one value for every server. When everything is ok, these values are the same in all servers. But sometimes in the new server are different, with values older than the in the others.
This is most times caused by an OpenLDAP bug. I see this quite often with MMR providers even though the entries have been correctly replicated to the other providers. Hence I asked for your detection method.
I've double-checked the code of my monitoring script very often!
And I'm not the only one seeing this false alarm in the monitoring. E.g. two guys approached me after my OpenLDAP lightning monitoring talk at FOSDEM reporting the same issue. And they use another monitoring tool.
So please check whether changes were correctly replicated instead. Yes, that's nearly impossible if you have many changes on many entries.
I've considered to search the highest entryCSN value per provider ID (server-side sorting on entryCSN, search limit 1) to compare it against its accompanying contextCSN value. But during first superficial tests I only got strange results. I have to investigate further before I can come up with detailed results.
Ciao, Michael.