Folks,
I have recently upgraded my ldap servers (RHEL6, locally built RPM) from 2.4.42 to 2.4.44. After several days of chasing why our nagios checks for syncrepl performance were reporting long delays, I gave up and rolled the primary server back to 2.4.42. This week I have built a primary and replica on RHEL7 with 2.4.44 (again locally built) and have just discovered through some intensive testing and log crawling that the syncprov log entries (with loglevel sync) show up lacking the csn about every hour, and if there are outstanding changes to be replicated when that happens, the syncrepl check on the replica server starts reporting it has fallen behind. This behavior does not exhibit with the 2.4.42 code base.
Is this the new normal? Is there something (that has not yet made it into the guide) that I need to change in my syncprov/syncrepl configuration to get around this?
This is a “normal” syncprov log entry:
Apr 1 18:18:40 ldap7p slapd[10061]: syncprov_sendresp: cookie=rid=100,csn=20160401221840.842942Z#000000#000#000000
This is one of the new ones: Apr 1 18:14:28 ldap7p slapd[10061]: syncprov_sendresp: cookie=rid=100
-- Frank Swasey Sr Systems Administrator Systems Architecture & Administration University of Vermont
--On Friday, April 01, 2016 11:41 PM +0000 Frank Swasey Frank.Swasey@uvm.edu wrote:
Folks,
I have recently upgraded my ldap servers (RHEL6, locally built RPM) from 2.4.42 to 2.4.44. After several days of chasing why our nagios checks for syncrepl performance were reporting long delays, I gave up and rolled the primary server back to 2.4.42. This week I have built a primary and replica on RHEL7 with 2.4.44 (again locally built) and have just discovered through some intensive testing and log crawling that the syncprov log entries (with loglevel sync) show up lacking the csn about every hour, and if there are outstanding changes to be replicated when that happens, the syncrepl check on the replica server starts reporting it has fallen behind. This behavior does not exhibit with the 2.4.42 code base.
Is this the new normal? Is there something (that has not yet made it into the guide) that I need to change in my syncprov/syncrepl configuration to get around this?
No, this would be a serious bug. Can you provide related configs and test data that reproduce the issue? I've not seen anything like this w/ delta-syncrepl MMR.
--Quanah
--
Quanah Gibson-Mount Platform Architect Zimbra, Inc. -------------------- Zimbra :: the leader in open source messaging and collaboration A division of Synacor, Inc
Quanah,
The config and data are not something I can release to you in their current configuration. Since this has not been seen before, I’ll do more digging and do my utmost to reproduce the failure with a configuration and data that I can share.
Before I go a lot further - of the changes to syncprov that happened in 2.4.43 and 2.4.44 is any more likely than the others to be at the root of this issue?
— Frank Swasey Systems Architecture & Administration
-----Original Message----- From: Quanah Gibson-Mount quanah@zimbra.com Date: Friday, April 1, 2016 at 8:18 PM To: Frank Swasey Frank.Swasey@uvm.edu, "openldap-technical@openldap.org" openldap-technical@openldap.org Subject: Re: Openldap 2.4.44 syncprov delays
--On Friday, April 01, 2016 11:41 PM +0000 Frank Swasey Frank.Swasey@uvm.edu wrote:
Folks,
I have recently upgraded my ldap servers (RHEL6, locally built RPM) from 2.4.42 to 2.4.44. After several days of chasing why our nagios checks for syncrepl performance were reporting long delays, I gave up and rolled the primary server back to 2.4.42. This week I have built a primary and replica on RHEL7 with 2.4.44 (again locally built) and have just discovered through some intensive testing and log crawling that the syncprov log entries (with loglevel sync) show up lacking the csn about every hour, and if there are outstanding changes to be replicated when that happens, the syncrepl check on the replica server starts reporting it has fallen behind. This behavior does not exhibit with the 2.4.42 code base.
Is this the new normal? Is there something (that has not yet made it into the guide) that I need to change in my syncprov/syncrepl configuration to get around this?
No, this would be a serious bug. Can you provide related configs and test data that reproduce the issue? I've not seen anything like this w/ delta-syncrepl MMR.
--Quanah
--
Quanah Gibson-Mount Platform Architect Zimbra, Inc.
Zimbra :: the leader in open source messaging and collaboration A division of Synacor, Inc
I have been able to reproduce the failure.
Do you want an ITS with the configuration files, database and instructions to reproduce?
— Frank Swasey Systems Architecture & Administration
-----Original Message----- From: openldap-technical openldap-technical-bounces@openldap.org on behalf of Frank Swasey Frank.Swasey@uvm.edu Date: Monday, April 4, 2016 at 8:38 AM To: Quanah Gibson-Mount quanah@zimbra.com, "openldap-technical@openldap.org" openldap-technical@openldap.org Subject: Re: Openldap 2.4.44 syncprov delays
Quanah,
The config and data are not something I can release to you in their current configuration. Since this has not been seen before, I’ll do more digging and do my utmost to reproduce the failure with a configuration and data that I can share.
Before I go a lot further - of the changes to syncprov that happened in 2.4.43 and 2.4.44 is any more likely than the others to be at the root of this issue?
— Frank Swasey Systems Architecture & Administration
-----Original Message----- From: Quanah Gibson-Mount quanah@zimbra.com Date: Friday, April 1, 2016 at 8:18 PM To: Frank Swasey Frank.Swasey@uvm.edu, "openldap-technical@openldap.org" openldap-technical@openldap.org Subject: Re: Openldap 2.4.44 syncprov delays
--On Friday, April 01, 2016 11:41 PM +0000 Frank Swasey Frank.Swasey@uvm.edu wrote:
Folks,
I have recently upgraded my ldap servers (RHEL6, locally built RPM) from 2.4.42 to 2.4.44. After several days of chasing why our nagios checks for syncrepl performance were reporting long delays, I gave up and rolled the primary server back to 2.4.42. This week I have built a primary and replica on RHEL7 with 2.4.44 (again locally built) and have just discovered through some intensive testing and log crawling that the syncprov log entries (with loglevel sync) show up lacking the csn about every hour, and if there are outstanding changes to be replicated when that happens, the syncrepl check on the replica server starts reporting it has fallen behind. This behavior does not exhibit with the 2.4.42 code base.
Is this the new normal? Is there something (that has not yet made it into the guide) that I need to change in my syncprov/syncrepl configuration to get around this?
No, this would be a serious bug. Can you provide related configs and test data that reproduce the issue? I've not seen anything like this w/ delta-syncrepl MMR.
--Quanah
--
Quanah Gibson-Mount Platform Architect Zimbra, Inc.
Zimbra :: the leader in open source messaging and collaboration A division of Synacor, Inc
--On Monday, April 04, 2016 6:24 PM +0000 Frank Swasey Frank.Swasey@uvm.edu wrote:
I have been able to reproduce the failure.
Do you want an ITS with the configuration files, database and instructions to reproduce?
Yes, please.
--Quanah
--
Quanah Gibson-Mount Platform Architect Zimbra, Inc. -------------------- Zimbra :: the leader in open source messaging and collaboration A division of Synacor, Inc
openldap-technical@openldap.org