All,
Deployed OpenLDAP 2.4 in production (with replication) mainly serving saslauthd and sendmail on Linux with client library 2.4.49. We are experiencing sporadic errors "ldap_simple_bind() failed -1 (Can't contact LDAP server).”. This issue happens on both saslauthd and on openldap when replicating to other service. However, upon investigation we have found that this error is not disruptive as both services (saslauthd and openldap server) have “retry” options built-in and subsequent request (immediately after a failure) receives successful response. Also, noticed that this issue manifests when communication occurs via ELB (AWS) and no connect issues were manifested when clients are directly pointed to openldap server (all connections are TLS). This makes me infer (suspect) that openldap client library might be caching (or tcp alive etc.) the connection which is not working well with ELB (Elastic Load Balancer). Wondering if the connection is really cached and is there a configuration parameter that I can try and tune the behavior (tried to chase this down looking into the source code of client library but could not locate it, perhaps I need to look more but wondering if someone else in the community has any experience in this regard).
Thank You.
Rallavagu Kon wrote:
All,
Deployed OpenLDAP 2.4 in production (with replication) mainly serving saslauthd and sendmail on Linux with client library 2.4.49. We are experiencing sporadic errors "ldap_simple_bind() failed -1 (Can't contact LDAP server).”. This issue happens on both saslauthd and on openldap when replicating to other service. However, upon investigation we have found that this error is not disruptive as both services (saslauthd and openldap server) have “retry” options built-in and subsequent request (immediately after a failure) receives successful response. Also, noticed that this issue manifests when communication occurs via ELB (AWS) and no connect issues were manifested when clients are directly pointed to openldap server (all connections are TLS). This makes me infer (suspect) that openldap client library might be caching (or tcp alive etc.) the connection which is not working well with ELB (Elastic Load Balancer). Wondering if the connection is really cached and is there a configuration parameter that I can try and tune the behavior (tried to chase this down looking into the source code of client library but could not locate it, perhaps I need to look more but wondering if someone else in the community has any experience in this regard).
Connection caching in libldap was a feature like ~20 years ago but removed because it was difficult to configure/use properly. So no, there's no such feature in any recent libldap. When you do a ldap_initialize() / ldap_bind() sequence you're getting a new TCP connection. Sounds like your load balancer is buggy.
Thank You.
Thanks for the response Howard. This is helpful. Upon further investigation, it appears that the application’s keepalive left to libldap defaults and those defaults did not go well with ELB’s default 60 seconds idle timeout. Some applications provide configuration for tuning ldap keepalive settings. However, wondering if there is an option to configure keepalive settings system wide (perhaps in /etc/ldap.conf?) for those applications that use libldap.
Thanks.
On Feb 27, 2021, at 1:42 PM, Howard Chu hyc@symas.com wrote:
Rallavagu Kon wrote:
All,
Deployed OpenLDAP 2.4 in production (with replication) mainly serving saslauthd and sendmail on Linux with client library 2.4.49. We are experiencing sporadic errors "ldap_simple_bind() failed -1 (Can't contact LDAP server).”. This issue happens on both saslauthd and on openldap when replicating to other service. However, upon investigation we have found that this error is not disruptive as both services (saslauthd and openldap server) have “retry” options built-in and subsequent request (immediately after a failure) receives successful response. Also, noticed that this issue manifests when communication occurs via ELB (AWS) and no connect issues were manifested when clients are directly pointed to openldap server (all connections are TLS). This makes me infer (suspect) that openldap client library might be caching (or tcp alive etc.) the connection which is not working well with ELB (Elastic Load Balancer). Wondering if the connection is really cached and is there a configuration parameter that I can try and tune the behavior (tried to chase this down looking into the source code of client library but could not locate it, perhaps I need to look more but wondering if someone else in the community has any experience in this regard).
Connection caching in libldap was a feature like ~20 years ago but removed because it was difficult to configure/use properly. So no, there's no such feature in any recent libldap. When you do a ldap_initialize() / ldap_bind() sequence you're getting a new TCP connection. Sounds like your load balancer is buggy.
Thank You.
-- -- Howard Chu CTO, Symas Corp. http://www.symas.com Director, Highland Sun http://highlandsun.com/hyc/ Chief Architect, OpenLDAP http://www.openldap.org/project/
--On Wednesday, March 3, 2021 3:36 PM -0800 Rallavagu Kon rallavagu@gmail.com wrote:
Thanks for the response Howard. This is helpful. Upon further investigation, it appears that the application's keepalive left to libldap defaults and those defaults did not go well with ELB's default 60 seconds idle timeout. Some applications provide configuration for tuning ldap keepalive settings. However, wondering if there is an option to configure keepalive settings system wide (perhaps in /etc/ldap.conf?) for those applications that use libldap.
This is not possible with OpenLDAP 2.4 but will be part of OpenLDAP 2.5. However, it may take a few years for OpenLDAP 2.5's libldap to make it's way into widely deployed Linux distributions.
Regards, Quanah
--
Quanah Gibson-Mount Product Architect Symas Corporation Packaged, certified, and supported LDAP solutions powered by OpenLDAP: http://www.symas.com
openldap-technical@openldap.org