https://bugs.openldap.org/show_bug.cgi?id=9705
Issue ID: 9705 Summary: synprov put add info into wrong cookie while performing test059-consumer-config Product: OpenLDAP Version: 2.4.59 Hardware: All OS: All Status: UNCONFIRMED Keywords: needs_review Severity: normal Priority: --- Component: overlays Assignee: bugs@openldap.org Reporter: gil77for@gmail.com Target Milestone: ---
There is an issue with syncrepl and syncprov working together in parallel. The issue can be seen in provider LDAP server logs of test059-consumer-config. For linux this problem does not cause the test to malfunction, but it may cause replication to malfunction on other systems (or possibly on linux in a different situation from the test). In case of OpenVMS we have this issue.
The problem is detected under the following circumstances, followed by a step-by-step description of the actions in test 059 that lead to malfunction:
1. Adding replication configuration for the dn: olcDatabase={1}ldif,cn=config
dn: olcDatabase={1}ldif,cn=config changetype: modify add: olcSyncrepl olcSyncrepl: {0}rid=001 provider=ldap://localhost:9001/ binddn="cn=config" bindmethod=simple credentials=******** searchbase="cn=schema,cn=config" type=refreshAndPersist retry="3 5 300 5" timeout=3 suffixmassage="cn=schema,cn=config,cn=consumer"
this registers a syncInfo structure with the parameter rid=001 inside the syncrepl engine. the corresponding syncCookie and cookieState are also created inside this structure
2. adding includes by ldapadd to the configuration, this causes {1}ldif,cn=config to be filled on the provider and register this adding in rid=001 syncInfo cookies with sid=001:
include: file:///LDAP$SCHEMA:core.ldif include: file:///LDAP$SCHEMA:cosine.ldif include: file:///LDAP$SCHEMA:inetorgperson.ldif include: file:///LDAP$SCHEMA:openldap.ldif include: file:///LDAP$SCHEMA:nis.ldif
3. Adding replication configuration for the dn: olcDatabase={1}mdb,cn=config,cn=consumer:
dn: olcDatabase={1}mdb,cn=config,cn=consumer objectClass: olcDatabaseConfig objectClass: olcmdbConfig olcDatabase: {1}mdb olcSuffix: dc=example,dc=com olcDbDirectory: [.testdir.db_2_a] olcRootDN: cn=Manager,dc=example,dc=com olcRootPW: secret olcSyncRepl: rid=002 provider=ldap://localhost:9001/ binddn="cn=Manager,dc=example,dc=com" bindmethod=simple credentials=secret searchbase="dc=example,dc=com" type=refreshAndPersist retry="3 5 300 5" timeout=3 olcUpdateRef: ldap://localhost:9001/
this registers new syncInfo structure with the parameter rid=002 inside the syncrepl engine. Should also be added info into cookies of this structure but this is the issue. The info about {1}mdb,cn=config,cn=consumer is added to the cookie of structure with rid=001 (!!!). Thus the cookie about {1}ldif is overrides by this new cookie data. It can be seen in provider server logs (was run on linux):
…… 61275c7d ldif_back_add: "olcDatabase={1}mdb,cn=config,cn=consumer" 61275c7d oc_check_required entry (olcDatabase={1}mdb,cn=config,cn=consumer), objectClass "olcMdbConfig" …… 61275c7d slap_get_csn: conn=1007 op=3 generated new csn=20210826091853.649104Z#000000#001#000000 manage=1 61275c7d slap_queue_csn: queueing 0x7fcdf4106bc0 20210826091853.649104Z#000000#001#000000 61275c7d ldif_write_entry: wrote entry "olcDatabase={1}mdb,cn=config,cn=consumer" 61275c7d ldif_back_add: err: 0 text: 61275c7d send_ldap_result: conn=1007 op=3 p=3 61275c7d send_ldap_result: err=0 matched="" text="" 61275c7d conn=1007 op=3 syncprov_matchops: recording uuid for dn=olcDatabase={1}mdb,cn=config,cn=consumer on opc=0x7fcdf4001608 …… 61275c7d slap_graduate_commit_csn: removing 0x7fcdf4106bc0 20210826091853.649104Z#000000#001#000000 61275c7d conn=1004 op=1 syncprov_sendresp: cookie=rid=001,sid=001,csn=20210826091853.649104Z#000000#001#000000 61275c7d conn=1004 op=1 syncprov_sendresp: sending LDAP_SYNC_ADD, dn=olcDatabase={1}mdb,cn=config,cn=consumer ……
rid=002 should be there!
When running on linux, this does not cause a problem for the test, because syncprov task works later than the ldif database replication on consumer by syncrepl task. And the overlapped cookie entry does not matter anymore.
In our case (OpenVMS) the order of asynchronous tasks (syncrepl and syncprov) is different and overwriting the cookie leads to loss of ldif database replication and failure of the test. The consumer does not receive scheme data. The differences in the order of tasks are caused by the features of pthreads library implementation for the VMS. But it should not matter for LDAP operation.
https://bugs.openldap.org/show_bug.cgi?id=9705
--- Comment #1 from Howard Chu hyc@openldap.org --- Eugeny Gil wrote:
Hi guys,
There is an issue with syncrepl and syncprov working together in parallel. The issue can be seen in provider LDAP server logs of test059-consumer-config. For linux this problem does not cause the test to malfunction, but it may cause replication to malfunction on other systems (or possibly on linux in a different situation from the test). In case of OpenVMS we have this issue.
The problem is detected under the following circumstances, followed by a step-by-step description of the actions in test 059 that lead to malfunction:
I cannot comment on your actual problem, but your analysis is incorrect.
this registers new syncInfo structure with the parameter rid=002 inside the syncrepl engine. Should also be added info into cookies of this structure but this is the issue. The info about {1}mdb,cn=config,cn=consumer is added to the cookie of structure with rid=001 (!!!). Thus the cookie about {1}ldif is overrides by this new cookie data. It can be seen in provider server logs (was run on linux):
That is not what the logs say.
……
61275c7d ldif_back_add: "olcDatabase={1}mdb,cn=config,cn=consumer"
61275c7d oc_check_required entry (olcDatabase={1}mdb,cn=config,cn=consumer), objectClass "olcMdbConfig"
……
61275c7d slap_get_csn: conn=1007 op=3 generated new csn=20210826091853.649104Z#000000#001#000000 manage=1
61275c7d slap_queue_csn: queueing 0x7fcdf4106bc0 20210826091853.649104Z#000000#001#000000
61275c7d ldif_write_entry: wrote entry "olcDatabase={1}mdb,cn=config,cn=consumer"
61275c7d ldif_back_add: err: 0 text:
61275c7d send_ldap_result: conn=1007 op=3 p=3
61275c7d send_ldap_result: err=0 matched="" text=""
61275c7d conn=1007 op=3 syncprov_matchops: recording uuid for dn=olcDatabase={1}mdb,cn=config,cn=consumer on opc=0x7fcdf4001608
……
61275c7d slap_graduate_commit_csn: removing 0x7fcdf4106bc0 20210826091853.649104Z#000000#001#000000
61275c7d conn=1004 op=1 syncprov_sendresp: cookie=*rid=001*,sid=001,csn=20210826091853.649104Z#000000#001#000000
61275c7d conn=1004 op=1 syncprov_sendresp: sending LDAP_SYNC_ADD, dn=olcDatabase={1}mdb,cn=config,cn=consumer
……
rid=002 should be there!
No. The rid identifies a consumer. If you search back in your log for the operations on conn=1004 in this case, you'll see that the consumer on that connection used rid=001, therefore rid=001 is what must be sent back to it.
Whatever other problem you may be seeing, this log snippet is not evidence of any bug.
https://bugs.openldap.org/show_bug.cgi?id=9705
Quanah Gibson-Mount quanah@openldap.org changed:
What |Removed |Added ---------------------------------------------------------------------------- Keywords|needs_review | Status|UNCONFIRMED |RESOLVED Resolution|--- |FEEDBACK
https://bugs.openldap.org/show_bug.cgi?id=9705
Quanah Gibson-Mount quanah@openldap.org changed:
What |Removed |Added ---------------------------------------------------------------------------- Status|RESOLVED |VERIFIED
https://bugs.openldap.org/show_bug.cgi?id=9705
--- Comment #2 from gil77for@gmail.com --- If you enable DEBUG_MERGE_STATE you can see the following in provider logs:
..... 61515c46 slap_get_csn: conn=1002 op=7 generated new csn=20210927055310.094047Z#000000#001#000000 manage=1 61515c46 slap_queue_csn: queueing 801CBFE0 20210927055310.094047Z#000000#001#000000 61515c46 oc_check_required entry (olcDatabase={1}ldif,cn=config), objectClass "olcLdifConfig" ..... 61515c58 slap_get_csn: conn=1006 op=3 generated new csn=20210927055328.891143Z#000000#001#000000 manage=1 61515c58 slap_queue_csn: queueing 80C53840 20210927055328.891143Z#000000#001#000000 61515c58 ldif_write_entry: wrote entry "olcDatabase={1}mdb,cn=config,cn=consumer" ..... merge_state: rid=001 si_syncCookie [0] 0 20210927055310.002047Z#000000#000#000000 merge_state: rid=001 si_syncCookie [1] 1 20210927055310.094047Z#000000#001#000000 merge_state: rid=001 si_cookieState [0] 0 20210927055310.002047Z#000000#000#000000 merge_state: rid=001 si_cookieState [1] 1 20210927055328.891143Z#000000#001#000000 .....
The next step is the si_cookieState[1] will override si_syncCookie[1]!
https://bugs.openldap.org/show_bug.cgi?id=9705
--- Comment #3 from Quanah Gibson-Mount quanah@openldap.org --- To investigate further, we would need access to an openvms environment. This issue does not appear under Linux, FreeBSD, Solaris, etc.