[Bug 9262] New: Segmentation Fault on ldap_chain_op during Network Issues

14 May 2020


      https://bugs.openldap.org/show_bug.cgi?id=9262
Bug ID: 9262
           Summary: Segmentation Fault on ldap_chain_op during Network
                    Issues
           Product: OpenLDAP
           Version: 2.4.49
          Hardware: x86_64
                OS: Linux
            Status: UNCONFIRMED
          Severity: normal
          Priority: ---
         Component: overlays
          Assignee: bugs@openldap.org
          Reporter: jeremy.diaz@rexconsulting.net
  Target Milestone: ---
Created attachment 733
  --> https://bugs.openldap.org/attachment.cgi?id=733&action=edit
Full backtrace
This is a follow up to the following OpenLDAP mailing list post:
https://www.openldap.org/lists/openldap-technical/202001/msg00045.html
The issue is as follows:
Slapd crashes with a segmentation fault periodically and seems to occur most
often during periods of high load. The issue seems to be related to the chain
overlay.
Some background info:
This environment was originally made up of 3 producers and 6 consumers all on
OpenLdap 2.4.48.
Initially the issue did not appear to be caused directly by high load. We
noticed that there was an uptick in saslauthd errors around the time when a
crash occurred:
Dec 11 07:23:20 {REDACTED} saslauthd[2720]: user ldap_search_st() failed: Timed
out
Dec 11 07:23:30 {REDACTED} saslauthd[2718]: user ldap_search_st() failed: Timed
out
Dec 11 07:23:40 {REDACTED} saslauthd[2718]: user ldap_search_st() failed: Timed
out
Dec 11 07:23:56 {REDACTED} saslauthd[2717]: user ldap_search_st() failed: Timed
out
Dec 11 07:23:57 {REDACTED} saslauthd[2720]: user ldap_search_st() failed: Timed
out
Dec 11 07:23:57 {REDACTED} saslauthd[2719]: user ldap_search_st() failed: Timed
out
Dec 11 07:24:05 {REDACTED} saslauthd[2718]: user ldap_search_st() failed: Timed
out
Dec 11 07:24:06 {REDACTED} saslauthd[2717]: user ldap_search_st() failed: Timed
out
Dec 11 07:24:54 {REDACTED} saslauthd[2719]:                : write failure
Dec 11 07:24:54 {REDACTED} saslauthd[2717]:                : write failure
Dec 11 07:24:54 {REDACTED} saslauthd[2719]:                : write: Broken pipe
Dec 11 07:24:54 {REDACTED} saslauthd[2717]:                : write: Broken pipe
Dec 11 07:24:54 {REDACTED} saslauthd[2718]:                : write failure
Dec 11 07:24:54 {REDACTED} saslauthd[2720]:                : write failure
Dec 11 07:24:54 {REDACTED} saslauthd[2718]:                : write: Broken pipe
Dec 11 07:24:54 {REDACTED} saslauthd[2720]:                : write: Broken pipe
Dec 11 07:25:04 {REDACTED} saslauthd[2721]:                : write failure
Dec 11 07:25:04 {REDACTED} saslauthd[2721]:                : write: Broken pipe
This seemed to indicate an issue with either AD, saslauthd or the ldap meta
instance. Slapd with log level -1 did not return any obvious error messages.
One consumer server was particularly problematic since it was in a data center
that had a slower network connection. This server was also dealing with a
subset of traffic corresponding to another one of our applications so overall
there was more traffic going into this server than some the other servers.
After some network improvements the segfault occurences dropped to about 1
every couple of days mostly confined to the problematic server.
The environment was eventually updated to 2.4.49 but there was no obvious
reduction in the number of segfaults.
A while after an incident occurred that caused an increase in the latency of
the network connections causing most of the slapd instances to crash constantly
every few minutes. The crashing stopped only after the network latency was
reduced.
Ultimately what seemed to significantly reduce the number of segfaults was
adding another 3 consumers to further reduce the load on each server.
The following backtrace we captured seems to indicate that there is an issue
with the chain overlay:
#0  ldap_chain_op (op=op@entry=0x7ffbe41f0870, rs=rs@entry=0x7ffbf09bbb20,
op_f=0x4a845e <ldap_back_search>, ref=ref@entry=0x0, depth=depth@entry=0) at
chain.c:422
#1  0x00000000004e61da in ldap_chain_response (op=0x7ffbe41f0870,
rs=0x7ffbf09bbb20) at chain.c:1090
#2  0x000000000049d323 in over_back_response (op=0x7ffbe41f0870,
rs=0x7ffbf09bbb20) at backover.c:237
#3  0x0000000000447c1a in slap_response_play (op=op@entry=0x7ffbe41f0870,
rs=rs@entry=0x7ffbf09bbb20) at result.c:508
#4  0x00000000004480f5 in send_ldap_response (op=op@entry=0x7ffbe41f0870,
rs=rs@entry=0x7ffbf09bbb20) at result.c:583
The issue ultimately seems to stem from slapd being really sensitive to network
latency. The chain overlay appears to be attempting to dereference a value that
never gets set due to a timeout mechanism being exceeded because of high load
and/or high network latency as chain.c:422 seems to suggest:
for ( ; !BER_BVISNULL( ref ); ref++ ) {
Attached is the updated full backtrace of all the threads and below is the
relevant portions of our olc configuration regarding the chain overlay:
dn: olcOverlay={1}chain,olcDatabase={-1}frontend,cn=config
objectClass: olcOverlayConfig
objectClass: olcChainConfig
olcOverlay: {1}chain
olcChainCacheURI: FALSE
olcChainMaxReferralDepth: 1
olcChainReturnError: TRUE
dn: olcDatabase={0}ldap,olcOverlay={1}chain,olcDatabase={-1}frontend,cn=config
objectClass: olcLDAPConfig
objectClass: olcChainDatabase
olcDatabase: {0}ldap
olcDbStartTLS: ldaps  starttls=no 
olcDbRebindAsUser: FALSE
olcDbChaseReferrals: TRUE
olcDbTFSupport: no
olcDbProxyWhoAmI: TRUE
olcDbProtocolVersion: 3
olcDbSingleConn: FALSE
olcDbCancel: abandon
olcDbUseTemporaryConn: FALSE
olcDbConnectionPoolMax: 16
olcDbSessionTrackingRequest: FALSE
olcDbNoRefs: FALSE
olcDbNoUndefFilter: FALSE
olcDbOnErr: continue
olcDbKeepalive: 0:0:0
dn: olcDatabase={1}ldap,olcOverlay={1}chain,olcDatabase={-1}frontend,cn=config
objectClass: olcLDAPConfig
objectClass: olcChainDatabase
olcDatabase: {1}ldap
olcDbURI: “{REDACTED URI 1}“
olcDbStartTLS: ldaps  starttls=no tls_cacert=“/{REDACTED}/openldap/tls/
 cacerts.cer" tls_reqcert=demand tls_crlcheck=none
olcDbIDAssertBind: mode=self flags=prescriptive,proxy-authz-non-critical bindm
 ethod=simple timeout=0 network-timeout=0 binddn={REDACTED}
credentials={REDACTED} keepalive=0:0:0
olcDbRebindAsUser: FALSE
olcDbChaseReferrals: TRUE
olcDbTFSupport: no
olcDbProxyWhoAmI: TRUE
olcDbProtocolVersion: 3
olcDbSingleConn: FALSE
olcDbCancel: abandon
olcDbUseTemporaryConn: FALSE
olcDbConnectionPoolMax: 16
olcDbSessionTrackingRequest: FALSE
olcDbNoRefs: FALSE
olcDbNoUndefFilter: FALSE
olcDbOnErr: continue
olcDbKeepalive: 0:0:0
dn: olcDatabase={2}ldap,olcOverlay={1}chain,olcDatabase={-1}frontend,cn=config
objectClass: olcLDAPConfig
objectClass: olcChainDatabase
olcDatabase: {2}ldap
olcDbURI: “{REDACTED URI 2}“
olcDbStartTLS: ldaps  starttls=no tls_cacert=“/{REDACTED}/openldap/tls/
 cacerts.cer" tls_reqcert=demand tls_crlcheck=none
olcDbIDAssertBind: mode=self flags=prescriptive,proxy-authz-non-critical bindm
 ethod=simple timeout=0 network-timeout=0 binddn={REDACTED}
credentials={REDACTED} keepalive=0:0:0
olcDbRebindAsUser: FALSE
olcDbChaseReferrals: TRUE
olcDbTFSupport: no
olcDbProxyWhoAmI: TRUE
olcDbProtocolVersion: 3
olcDbSingleConn: FALSE
olcDbCancel: abandon
olcDbUseTemporaryConn: FALSE
olcDbConnectionPoolMax: 16
olcDbSessionTrackingRequest: FALSE
olcDbNoRefs: FALSE
olcDbNoUndefFilter: FALSE
olcDbOnErr: continue
olcDbKeepalive: 0:0:0
dn: olcDatabase={3}ldap,olcOverlay={1}chain,olcDatabase={-1}frontend,cn=config
objectClass: olcLDAPConfig
objectClass: olcChainDatabase
olcDatabase: {3}ldap
olcDbURI: “{REDACTED URI 3}“
olcDbStartTLS: ldaps  starttls=no tls_cacert=“/{REDACTED}/openldap/tls/
 cacerts.cer" tls_reqcert=demand tls_crlcheck=none
olcDbIDAssertBind: mode=self flags=prescriptive,proxy-authz-non-critical bindm
 ethod=simple timeout=0 network-timeout=0 binddn={REDACTED}
credentials={REDACTED} keepalive=0:0:0
olcDbRebindAsUser: FALSE
olcDbChaseReferrals: TRUE
olcDbTFSupport: no
olcDbProxyWhoAmI: TRUE
olcDbProtocolVersion: 3
olcDbSingleConn: FALSE
olcDbCancel: abandon
olcDbUseTemporaryConn: FALSE
olcDbConnectionPoolMax: 16
olcDbSessionTrackingRequest: FALSE
olcDbNoRefs: FALSE
olcDbNoUndefFilter: FALSE
olcDbOnErr: continue
olcDbKeepalive: 0:0:0
-- 
You are receiving this mail because:
You are on the CC list for the bug.

2025

2024

2023

2022

2021

2020

2019

2018

2017

2016

2015

2014

2013

2012

2011

2010

2009

2008

2007

2006

[Bug 9262] New: Segmentation Fault on ldap_chain_op during Network Issues