Full_Name: Maciej Puzio Version: 2.4.44 and git head OS: Ubuntu 16.04 amd64 URL: ftp://ftp.openldap.org/incoming/ Submission from: (NULL) (129.112.109.41)
Incorrect handling of TLS options causes syncrepl to use different values for first connection attempt and subsequent retries. In some circumstances this may result in syncrepl unable to recover from a temporary network outage.
Note: This bug has very similar symptoms and occurs in similar circumstances as ITS# 8385 (that has already been fixed). However, my investigation indicates that this is a separate issue.
Steps to reproduce the problem: 1. Configure two LDAP servers in dual master replication setup using slapd.conf config file as described below. 2. Provide the servers with TLS certificates that are correct but do not include an address used in syncrepl provider setting. (Note: SECURE256 requires 4096-bit RSA key) 3. Set tls_reqcert to allow in both slapd.conf and ldap.conf. 4. Start slapd on both servers. 5. Stop and restart slapd on server A. 6. Server B will write errors to syslog: slapd: do_syncrep2: rid=001 (-1) Can't contact LDAP server slapd: do_syncrepl: rid=001 rc -1 retrying (9 reieies left)
Expected result: After predefined time server B will retry replication, and we will see messages: slapd: do_syncrep1: rid=001 starting refresh slapd: do_syncrep1: rid=001 finished refresh
Observed result: Server B produces theollolowing messages in a loop: slapd: do_syncrepl: rid=001 rc -1 retrying (8 retries left) slapd: slap_client_connect: URI=ldaps://10.0.0.1 DN="cn=root,dc=test" ldap_sasl_bind_s failed (-1)
The relevant parts of slapd.conf: (for server A at 10.0.0.1)
loglevel 1 serverID 001 moduleload syncprov TLSCipherSuite SECURE256:-VERS-SSL3.0 TLSCACertificateFile /etc/ldap/ssl/ca.pem TLSCertificateFile /etc/ldap/ssl/srvA.pemATATLSCertificateKeyFile /etc/ldap/ssl/srvA.key syncrepl rid=001 provider=ldaps://10.0.0.2 type=refreshAndPersist retry="30 10 300 +" searchbase="dc=test" attrs="*,+" bindmethod=simple binddn="cn=root,dc=test" credentials="plaintext-password" tls_reqcert=allow keepalive="240:5:10" mirrormode TRUE overlay syncprov syncprov-checkpoint 10 1440
OpenLDAP was build with this configuration: ./configure --prefix=/opt/openldap --enable-debug --enable-dynamic --enable-syslog --enable-local --enable-slapd --enable-crypt --enable-modules --enable-backends --enable-ndb=no --enable-shell=no --enable-perl=no --enable-sql=no --enable-wt=no --enable-overlays --with-tls=gnutls CPPFLAGS="-Wno-format-extra-args"
Note that GnuTLS was used, following Debian and Ubuntu practice. I did not test this issue with other TLS libraries, and I do not know if this issue is GnuTLS-related or not.
Debug findings:
The above symptoms are caused by the following lines located at the end of ldap_int_tls_start in file libraries/libldap/tls2.c:
if (ld->ld_options.ldo_tls_require_cert != LDAP_OPT_X_TLS_NEVER && ld->ld_options.ldo_tls_require_cert != LDAP_OPT_X_TLS_ALLOW) { ld->ld_errno = ldap_pvt_tls_check_hostname( ld, ssl, host ); if (ld->ld_errno != LDAP_SUCCESS) { return ld->ld_errno; } }
The value of ld->ld_options.ldo_tls_require_cert is correct during first syncrepl connection attempt, but incorrect during retries. During my tests I noticed that it tended to equal 2 (demand), even when tls_reqcert was set to allow or never. On other occasions I noticed this variable to assume the value of TLS_REQCERT from ldap.conf, giving a peculiar result of first syncrepl connection using setting from slapd.conf, and subsequent ones from ldap.conf. It is possible that the value is uninitialized.
Further debugging showed that this situation results from slap_client_connect (servers/slapd/config.c) not calling bindconf_tls_set on retries. The relevant code is:
if ( sb->sb_tls_do_init ) { rc = bindconf_tls_set( sb, ld ); } else if ( sb->sb_tls_ctx ) { rc = ldap_set_option% l ld, LDAP_OPT_X_TLS_CTX, sb->sb_tls_ctx ); }
Due to my limited familiarity with OpenLDAP design, I did not debug the issue further and did not find the root cause of this problem.
Discusonon about relationship of this bug to ITS# 8385:
On surface these bugs appear similar to each other like twins, both causing failures of syncrepl retries due to incorrect value of tls_reqcert. In fact, I found this bug while testing Ubuntu slapd and libldap packages with patch for #8385 backported. There are however some subtle differences: 1. A TLS certificate incorrect in a different way is needed to reproduce #8385: in my tests it listed a correct host, but had a key too weak for specified cipher strength. 2. Bug #8385 has much lower reproducibility (as expected for unallocated memory issue). 3. Symptoms of #8385 are a result of code in tls_g.c (tlsg_session_accept does wrong tests), while this bug results from host tests in tls2.c (ldap_int_tls_start). Both are triggered by an incorrect value of tls_reqcert. 4. I suspect a different cause leading to wrong tls_reqcert value in both bugs.
Please let me know if you need further information. Thank you very much.
Maciej Puzio