Good morning,
I am writting from IT Services from Universidad de Navarra. We have recently upgraded our openldap servers from openldap 2.4.34 with BDB 5.3.21 to openldap 2.4.44 with MDB databases.
We have got configured replication from the master server [1] to some slave servers [2] (syncrepl refreshAndPersist), and it is working ok.
Usually, when a change is made on master server, I can see how it is propagated and applied on the slave server. Using Auditlog Overlay I can see on the slave server:
# modify 1470723918 dc=base,dc=com cn=Admin,dc=base,dc=com conn=-1 dn: ... changetype: modify replace: [..] # end modify 1470723918
And just after that, the contextCSN gets updated too:
# modify 1470723918 dc=base,dc=com cn=Admin,dc=base,dc=com conn=-1 dn: dc=base,dc=com changetype: modify replace: contextCSN contextCSN: 20160809062518.877725Z#000000#000#000000 - # end modify 1470723918
Is this the normal behaviour?
I do not see the contextCSN update on the accesslog database on the master server, nor on his Auditlog. So I do not know if contextCSN has been replicated from the master server, or the slave database is updating it.
But I am seeing some weird things from time to time: sometimes, somehow, the contextCSN attribute does not get updated after the modification. Checking its value in the master server, I can see that it has been updated correctly, but not on the slave server.
The strange thing is that it happens just like once every tens of changes.
Could it be some kind of bad configuration??
On the previous openldap version, we were checking contextCSN value on master and slave servers in order to check the replication status. But right now, although replication is working ok, sometimes the contextCSN does not get updated on the slave sever, so we can not use it in order to check the replication status.
Thank you so much for your help.
Regards,
[1] Master: * Accesslog Database:
database mdb maxsize 1073741824 suffix cn=log directory /../openldap/var/accesslog rootdn "cn=Admin,dc=base,dc=com" index objectClass eq index entryCSN eq index reqEnd eq index reqResult eq index reqStart eq index reqDN eq index default eq
overlay syncprov syncprov-reloadhint true syncprov-nopresent true
* Main Database overlays:
overlay syncprov syncprov-checkpoint 1000 60
overlay accesslog logdb cn=log logops writes logsuccess true logpurge 14+00:00 01+00:00
[2] Slave:
syncrepl rid=1 provider="ldap://ldap-master.base.com:389/" type=refreshAndPersist retry="60 10 300 +" searchbase="dc=base,dc=com" logbase="cn=log" syncdata=accesslog logfilter="(&(objectClass=auditWriteObject)(reqResult=0))" scope=sub schemachecking=off binddn=...
*Oscar Remírez de Ganuza Satrústegui* IT Services Universidad de Navarra Tel. +34 948425600 x803130 http://www.unav.edu/web/it/
Óscar Remírez de Ganuza Satrústegui wrote:
But I am seeing some weird things from time to time: sometimes, somehow, the contextCSN attribute does not get updated after the modification. Checking its value in the master server, I can see that it has been updated correctly, but not on the slave server.
My monitor checks are also sometimes flapping. contextCSN values differ but the modifications were correctly replicated. Also happened with LTB builds of OpenLDAP 2.4.44 and back-mdb but also prior versions.
Ciao, Michael.
--On Tuesday, August 09, 2016 11:43 PM +0200 Michael Ströder michael@stroeder.com wrote:
Óscar Remírez de Ganuza Satrústegui wrote:
But I am seeing some weird things from time to time: sometimes, somehow, the contextCSN attribute does not get updated after the modification. Checking its value in the master server, I can see that it has been updated correctly, but not on the slave server.
My monitor checks are also sometimes flapping. contextCSN values differ but the modifications were correctly replicated. Also happened with LTB builds of OpenLDAP 2.4.44 and back-mdb but also prior versions.
Didn't we have a discussion about why one should run the syncprov overlay on all nodes a while back?
--quanah
--
Quanah Gibson-Mount Platform Architect Manager, Systems Team Zimbra, Inc. -------------------- Zimbra :: the leader in open source messaging and collaboration A division of Synacor, Inc
Quanah Gibson-Mount wrote:
--On Tuesday, August 09, 2016 11:43 PM +0200 Michael Ströder michael@stroeder.com wrote:
Óscar Remírez de Ganuza Satrústegui wrote:
But I am seeing some weird things from time to time: sometimes, somehow, the contextCSN attribute does not get updated after the modification. Checking its value in the master server, I can see that it has been updated correctly, but not on the slave server.
My monitor checks are also sometimes flapping. contextCSN values differ but the modifications were correctly replicated. Also happened with LTB builds of OpenLDAP 2.4.44 and back-mdb but also prior versions.
Didn't we have a discussion about why one should run the syncprov overlay on all nodes a while back?
Yes, and since then it is enabled on all replicas. Still this issue happens.
Ciao, Michael.
--On Wednesday, August 10, 2016 12:45 AM +0200 Michael Ströder michael@stroeder.com wrote:
Didn't we have a discussion about why one should run the syncprov overlay on all nodes a while back?
Yes, and since then it is enabled on all replicas. Still this issue happens.
Ciao, Michael.
Interesting... I haven't run replica only nodes in a while. Maybe someday I'll be able to work on OpenLDAP again.
--Quanah
--
Quanah Gibson-Mount
Quanah Gibson-Mount wrote:
--On Wednesday, August 10, 2016 12:45 AM +0200 Michael Ströder michael@stroeder.com wrote:
Didn't we have a discussion about why one should run the syncprov overlay on all nodes a while back?
Yes, and since then it is enabled on all replicas. Still this issue happens.
Interesting... I haven't run replica only nodes in a while. Maybe someday I'll be able to work on OpenLDAP again.
The point is that not read-only consumers have the issue with contextCSN. It's rather one or another of the MMR providers.
Ciao, Michael.
On Wed, Aug 10, 2016 at 9:33 PM, Michael Ströder michael@stroeder.com wrote:
Quanah Gibson-Mount wrote:
--On Wednesday, August 10, 2016 12:45 AM +0200 Michael Ströder michael@stroeder.com wrote:
Didn't we have a discussion about why one should run the syncprov overlay on all nodes a while back?
Yes, and since then it is enabled on all replicas. Still this issue happens.
Interesting... I haven't run replica only nodes in a while. Maybe
someday I'll
be able to work on OpenLDAP again.
The point is that not read-only consumers have the issue with contextCSN. It's rather one or another of the MMR providers.
Ciao, Michael.
Thank you so much for your answers. We also have syncprov overlay running on the slave servers too.
We will have to live with this issue then:
* We have adapted our nagios script so that it now checks both contextCSN and last modified entry's entryCSN values in order to know if slave replication is working ok.
* We are also checking on cn=Tasklist,cn=Threads,cn=Monitor if the replication thread is running (do_syncrepl) in the slaves.
Thanks again for your help.
Regards,
*Oscar Remírez de Ganuza Satrústegui* IT Services Universidad de Navarra Tel. +34 948425600 x803130 http://www.unav.edu/web/it/
Óscar Remírez de Ganuza Satrústegui wrote:
- We have adapted our nagios script so that it now checks both contextCSN
and last modified entry's entryCSN values in order to know if slave replication is working ok.
How do you determine the "last modified entry's entryCSN values"?
Ciao, Michael.
On Fri, Aug 19, 2016 at 7:52 PM, Michael Ströder michael@stroeder.com wrote:
Óscar Remírez de Ganuza Satrústegui wrote:
- We have adapted our nagios script so that it now checks both contextCSN
and last modified entry's entryCSN values in order to know if slave replication is working ok.
How do you determine the "last modified entry's entryCSN values"?
We are using auditlog overlay [1], so we check that log to find the last entryCSN written on the database. Not very efficient, so we just check it when contextCSN attributes on master and slave are not equal.
Regards,
[1] http://www.openldap.org/software/man.cgi?query=slapo-auditlog
*Oscar Remírez de Ganuza Satrústegui* IT Services Universidad de Navarra Tel. +34 948425600 x803130 http://www.unav.edu/web/it/
Óscar Remírez de Ganuza Satrústegui wrote:
On Fri, Aug 19, 2016 at 7:52 PM, Michael Ströder michael@stroeder.com wrote:
Óscar Remírez de Ganuza Satrústegui wrote:
- We have adapted our nagios script so that it now checks both contextCSN
and last modified entry's entryCSN values in order to know if slave replication is working ok.
How do you determine the "last modified entry's entryCSN values"?
We are using auditlog overlay [1], so we check that log to find the last entryCSN written on the database. Not very efficient, so we just check it when contextCSN attributes on master and slave are not equal.
Hmm, I'm using slapo-accesslog everywhere. Querying accesslog DB might be more efficient. I will try that on the replicas.
Ciao, Michael.
openldap-technical@openldap.org