https://bugs.openldap.org/show_bug.cgi?id=10136
Issue ID: 10136 Summary: Sync replication causing glue entries. Product: OpenLDAP Version: 2.5.13 Hardware: x86_64 OS: Windows Status: UNCONFIRMED Keywords: needs_review Severity: normal Priority: --- Component: slapd Assignee: bugs@openldap.org Reporter: mbalakri@opentext.com Target Milestone: ---
Created attachment 991 --> https://bugs.openldap.org/attachment.cgi?id=991&action=edit Node1 and Nod2 sync replication logs
We have configured mirror mode replication with two nodes. Node1 syncrepl
{0}rid=1 provider=ldaps://AWPCISQL22.otxlab.net:6366 type=refreshAndPersist searchbase="o=otxlab.net" schemachecking=off bindmethod=simple binddn="cn=Directory Manager,o=otxlab.net" credentials=d retry="120 10 300 +" timeout=60 tls_reqcert=never tls_cacert="C:\Program Files\OpenText\CARS\defaultInst\certificates\AWPCISQL22.otxlab.net-cert.cer" tls_cert="C:\Program Files\OpenText\CARS\defaultInst\certificates\AWPCISQL22.otxlab.net-cert.cer" tls_key="C:\Program Files\OpenText\CARS\defaultInst\certificates\AWPCISQL22.otxlab.net-key.pvk"
Node2 syncrepl
{0}rid=2 provider=ldaps://AWPCTHA1.otxlab.net:6366 type=refreshAndPersist searchbase="o=otxlab.net" schemachecking=off bindmethod=simple binddn="cn=Directory Manager,o=otxlab.net" credentials=d retry="120 10 300 +" timeout=60 tls_reqcert=never tls_cacert="C:\Program Files\OpenText\CARS\defaultInst\certificates\AWPCTHA1.otxlab.net-cert.cer" tls_cert="C:\Program Files\OpenText\CARS\defaultInst\certificates\AWPCTHA1.otxlab.net-cert.cer" tls_key="C:\Program Files\OpenText\CARS\defaultInst\certificates\AWPCTHA1.otxlab.net-key.pvk"
olcMultiProvider is ON.
Now when records are inserted into node1, it is replicating to node2 but after sometime glue entries are created in node2 and from then onwards replication is not working. Attached the sync logs from both the nodes. The below two entries are in glue state and not recovering from this state. cn=Method Set CAPackage,cn=Cordys CAPConnector,cn=cordys,cn=defaultInst,o=otxlab.net
cn=Cordys CAPConnector,cn=cordys,cn=defaultInst,o=otxlab.net
Any clue on what is going wrong here? Is this due to the 'retry' configuration?
https://bugs.openldap.org/show_bug.cgi?id=10136
--- Comment #1 from Mini mbalakri@opentext.com --- The setup was working in OpenLDAP 2.4.58 version and the issue is observed after upgrading to OpenLDAP 2.5.13
https://bugs.openldap.org/show_bug.cgi?id=10136
--- Comment #2 from Ondřej Kuzník ondra@mistotebe.net --- On Fri, Nov 24, 2023 at 06:19:24PM +0000, openldap-its@openldap.org wrote:
We have configured mirror mode replication with two nodes. Node1 syncrepl
{0}rid=1 provider=ldaps://AWPCISQL22.otxlab.net:6366 type=refreshAndPersist searchbase="o=otxlab.net" schemachecking=off bindmethod=simple binddn="cn=Directory Manager,o=otxlab.net" credentials=d retry="120 10 300 +" timeout=60 tls_reqcert=never tls_cacert="C:\Program Files\OpenText\CARS\defaultInst\certificates\AWPCISQL22.otxlab.net-cert.cer" tls_cert="C:\Program Files\OpenText\CARS\defaultInst\certificates\AWPCISQL22.otxlab.net-cert.cer" tls_key="C:\Program Files\OpenText\CARS\defaultInst\certificates\AWPCISQL22.otxlab.net-key.pvk"
Node2 syncrepl
{0}rid=2 provider=ldaps://AWPCTHA1.otxlab.net:6366 type=refreshAndPersist searchbase="o=otxlab.net" schemachecking=off bindmethod=simple binddn="cn=Directory Manager,o=otxlab.net" credentials=d retry="120 10 300 +" timeout=60 tls_reqcert=never tls_cacert="C:\Program Files\OpenText\CARS\defaultInst\certificates\AWPCTHA1.otxlab.net-cert.cer" tls_cert="C:\Program Files\OpenText\CARS\defaultInst\certificates\AWPCTHA1.otxlab.net-cert.cer" tls_key="C:\Program Files\OpenText\CARS\defaultInst\certificates\AWPCTHA1.otxlab.net-key.pvk"
olcMultiProvider is ON.
Now when records are inserted into node1, it is replicating to node2 but after sometime glue entries are created in node2 and from then onwards replication is not working. Attached the sync logs from both the nodes. The below two entries are in glue state and not recovering from this state. cn=Method Set CAPackage,cn=Cordys CAPConnector,cn=cordys,cn=defaultInst,o=otxlab.net
cn=Cordys CAPConnector,cn=cordys,cn=defaultInst,o=otxlab.net
Any clue on what is going wrong here? Is this due to the 'retry' configuration?
Hi, you're not showing the other side of the replication (Node1 replicating from Node2). Also your logs suggest that Node2 considers cn=Cordys CAPConnector,cn=cordys,cn=defaultInst,o=otxlab.net has been intentionally deleted since the last time Node1 has updated it.
In general, unless you can reproduce a desync and have a (semi-)reliable way of doing so that you can share here, please post to openldap-technical as 99 % of the time an issue comes from operational issues, not code. Closing this issue, please follow up there if you have futher questions.
Regards,
https://bugs.openldap.org/show_bug.cgi?id=10136
Ondřej Kuzník ondra@mistotebe.net changed:
What |Removed |Added ---------------------------------------------------------------------------- Status|UNCONFIRMED |RESOLVED Resolution|--- |INVALID Keywords|needs_review |
https://bugs.openldap.org/show_bug.cgi?id=10136
Quanah Gibson-Mount quanah@openldap.org changed:
What |Removed |Added ---------------------------------------------------------------------------- Status|RESOLVED |VERIFIED
https://bugs.openldap.org/show_bug.cgi?id=10136
--- Comment #3 from Mini mbalakri@opentext.com --- Ondřej Kuzník, Thank you for checking the issue. I want to reopen this ticket as we have identified the code which is causing the issue for us. This fix https://bugs.openldap.org/show_bug.cgi?id=9282 is causing problem for us.
The replication is not happening with log message like -
6560dc3c.54467 00000000000006B0 syncrepl_entry: rid=002 cn=GetCollection,cn=Method Set ISVPackage,cn=Cordys ESBServer,cn=cordys,cn=defaultInst,o=otxlab.net 6560dc3c.54476 00000000000006B0 syncrepl_entry: rid=002 entry 'cn=GetCollection,cn=Method Set ISVPackage,cn=Cordys ESBServer,cn=cordys,cn=defaultInst,o=otxlab.net' csn=20231124172411.004202Z#000000#001#000000 not new enough, ignored
All the entries which are having the message "not new enough, ignored" is not replicating to second LDAP.
In our scenario, both LDAP servers are running, and we are adding entries to server1. server2 is not stopped/restarted in between. This is reproducible without our application, and by importing a ldif file using jxplorer.
I have reverted this fix locally https://git.openldap.org/openldap/openldap/-/commit/8d428f3163e56f90cb84cddf..., build openldap and tested, then our scenario is working.
In our testing, we couldn't reproduce the issue on Linux environment, but we will be testing more on Linux and see whether it is reproducible. In Windows environment, the issue is not happening always, it works sometimes.
Will you please check and help us on how to proceed?
https://bugs.openldap.org/show_bug.cgi?id=10136
--- Comment #4 from Ondřej Kuzník ondra@mistotebe.net --- On Fri, Feb 09, 2024 at 05:09:22AM +0000, openldap-its@openldap.org wrote:
Ondřej Kuzník, Thank you for checking the issue. I want to reopen this ticket as we have identified the code which is causing the issue for us. This fix https://bugs.openldap.org/show_bug.cgi?id=9282 is causing problem for us.
The replication is not happening with log message like -
6560dc3c.54467 00000000000006B0 syncrepl_entry: rid=002 cn=GetCollection,cn=Method Set ISVPackage,cn=Cordys ESBServer,cn=cordys,cn=defaultInst,o=otxlab.net 6560dc3c.54476 00000000000006B0 syncrepl_entry: rid=002 entry 'cn=GetCollection,cn=Method Set ISVPackage,cn=Cordys ESBServer,cn=cordys,cn=defaultInst,o=otxlab.net' csn=20231124172411.004202Z#000000#001#000000 not new enough, ignored
All the entries which are having the message "not new enough, ignored" is not replicating to second LDAP.
In our scenario, both LDAP servers are running, and we are adding entries to server1. server2 is not stopped/restarted in between. This is reproducible without our application, and by importing a ldif file using jxplorer.
I have reverted this fix locally https://git.openldap.org/openldap/openldap/-/commit/8d428f3163e56f90cb84cddf..., build openldap and tested, then our scenario is working.
That code very much has to remain, if the consumer was told it is up to date to CSN x, then any further messages tagged with x or older *have to* be ignored. The issue is likely to be elsewhere. Can you provide logs of the previous replication sessions up to the offending CSN?
Regards,
https://bugs.openldap.org/show_bug.cgi?id=10136
--- Comment #5 from Ondřej Kuzník ondra@mistotebe.net --- On Fri, Feb 09, 2024 at 11:57:48AM +0000, openldap-its@openldap.org wrote:
That code very much has to remain, if the consumer was told it is up to date to CSN x, then any further messages tagged with x or older *have to* be ignored. The issue is likely to be elsewhere. Can you provide logs of the previous replication sessions up to the offending CSN?
Also still waiting for the provider side of things: - configuration - logs
Without that it will be hard to tell what happened.
BTW, I think you might have done that already but if deltasync is in effect, make sure ACLs give the replica user full access to both accesslog and the replicated DB.
Thanks,
https://bugs.openldap.org/show_bug.cgi?id=10136
--- Comment #6 from Mini mbalakri@opentext.com --- Created attachment 1008 --> https://bugs.openldap.org/attachment.cgi?id=1008&action=edit node1 configuration
node1 configuration ldif
https://bugs.openldap.org/show_bug.cgi?id=10136
--- Comment #7 from Mini mbalakri@opentext.com --- Created attachment 1009 --> https://bugs.openldap.org/attachment.cgi?id=1009&action=edit Node2 configuration ldif file
Node2 configuration ldif file
https://bugs.openldap.org/show_bug.cgi?id=10136
--- Comment #8 from Mini mbalakri@opentext.com --- Ondřej Kuzník, Attached node1 and node2 configurations.
The provider is not having much logs (same as in the previous attachment), I will reproduce again and try to get the fresh logs.
Not configured delta sync replication.
Thank you Regards, Mini
https://bugs.openldap.org/show_bug.cgi?id=10136
--- Comment #9 from Mini mbalakri@opentext.com --- Created attachment 1010 --> https://bugs.openldap.org/attachment.cgi?id=1010&action=edit Node1 log
Node1 log
https://bugs.openldap.org/show_bug.cgi?id=10136
--- Comment #10 from Mini mbalakri@opentext.com --- Created attachment 1011 --> https://bugs.openldap.org/attachment.cgi?id=1011&action=edit Node2 logs
node2 logs
https://bugs.openldap.org/show_bug.cgi?id=10136
--- Comment #11 from Mini mbalakri@opentext.com --- Ondřej Kuzník, Attached both provider and consumer fresh logs.
Node1 having 3134 entries, but Node2 having only 3069 entries.
Node2 logs having 69 occurrences of "not new enough, ignored" messages.
https://bugs.openldap.org/show_bug.cgi?id=10136
--- Comment #12 from Quanah Gibson-Mount quanah@openldap.org --- I notice that you're running 2.5.13 and on Windows. Are we sure this isn't ITS#10100 that's fixed in 2.5.17?
https://bugs.openldap.org/show_bug.cgi?id=10136
--- Comment #13 from Narayana narayana.reddy@yahoo.co.in --- Quanah Gibson-Mount,
Thanks for your response. Could you please confirm if 2.5.17 has been released? If not, please help us by when it will be released.
https://bugs.openldap.org/show_bug.cgi?id=10136
--- Comment #14 from Quanah Gibson-Mount quanah@openldap.org --- (In reply to Narayana from comment #13)
Quanah Gibson-Mount,
Thanks for your response. Could you please confirm if 2.5.17 has been released? If not, please help us by when it will be released.
I suggest you:
a) Subscribe to the OpenLDAP announce list, where all new releases are posted.
b) Open a web browser, and navigate to https://www.openldap.org where your question will be immediately answered.
https://bugs.openldap.org/show_bug.cgi?id=10136
--- Comment #15 from Mini mbalakri@opentext.com --- (In reply to Narayana from comment #13)
Quanah Gibson-Mount,
Thanks for your response. Could you please confirm if 2.5.17 has been released? If not, please help us by when it will be released.
(In reply to Quanah Gibson-Mount from comment #12)
I notice that you're running 2.5.13 and on Windows. Are we sure this isn't ITS#10100 that's fixed in 2.5.17?
Quanah Gibson-Mount, Thank you so much for the pointer, looks like ITS#10100 resolves the replication issue. I will test thoroughly and update the ticket. Thanks again.
https://bugs.openldap.org/show_bug.cgi?id=10136
Quanah Gibson-Mount quanah@openldap.org changed:
What |Removed |Added ---------------------------------------------------------------------------- Resolution|INVALID |--- Status|VERIFIED |UNCONFIRMED
https://bugs.openldap.org/show_bug.cgi?id=10136
Quanah Gibson-Mount quanah@openldap.org changed:
What |Removed |Added ---------------------------------------------------------------------------- Resolution|--- |DUPLICATE Status|UNCONFIRMED |RESOLVED
--- Comment #16 from Quanah Gibson-Mount quanah@openldap.org ---
*** This issue has been marked as a duplicate of issue 10100 ***
https://bugs.openldap.org/show_bug.cgi?id=10136
Quanah Gibson-Mount quanah@openldap.org changed:
What |Removed |Added ---------------------------------------------------------------------------- Status|RESOLVED |VERIFIED