The root cause of the problem: On syncrepl retry, code in slap_client_connect (servers/slapd/config.c) initialized ld (LDAP) object with default values, rather than values saved in sb (slap_bindconf). This was because sb->sb_tls_do_init was true and bindconf_tls_set(sb, ld) was not called. This issue is not related to any specific TLS library. All syncrepl retries are executed with wrong TLS settings in ld->ld_options, including ldo_tls_require_cert, ldo_tls_keyfile, ldo_tls_certfile, ldo_tls_cacertfile, and other (ldo_tls_require_cert has default value 2 (demand), while string fields have nulls). However, the bug is usually masked, because sb->sb_tls_ctx contains TLS context correctly initialized during initial connection attempt, and this context is assigned to ld as LDAP_OPT_X_TLS_CTX, and then used by the TLS library, while incorrect values in ld->ld_options are ignored in most places. But the host-checking code in ldap_int_tls_start (libraries/libldap/tls2.c) uses ld->ld_options and triggers the bug.
Explanation of the patch: The patch modifies function bindconf_tls_set that copies values from slap_bindconf to LDAP object. Code that calls bindconf_tls_set is also modified. 1. If slap_bindconf object has TLS context, assign it to LDAP object. 2. if slap_bindconf does not have a TLS context, create new context and assign it it to both slap_bindconf and LDAP objects. 3. Removed context replacement functionality and related conditions (newctx). The intent of this functionality was unclear, but did not appear to be compatible with how binconf_tls_set was used. 4. The result of the above code changes is that TLS context is always present and correctly assigned, while previously it sometimes was, and sometimes wasn't. 5. Simplified calls of bindconf_tls_set in the following files: servers/slapd/config.c servers/slapd/back-ldap/bind.c servers/slapd/back-meta/conn.c servers/slapd/back-asyncmeta/conn.c
Things to do: I tested the modified slap_client_connect (servers/slapd/config.c), but the modifications to files in back-{ldap,meta,asyncmeta} were done without proper code and functionality analysis, and without any testing. I am hoping for a review from someone knowledgeable.
Thank you