https://bugs.openldap.org/show_bug.cgi?id=9288
Issue ID: 9288 Summary: slapd service stops suddenly but generates crash file Product: OpenLDAP Version: 2.4.49 Hardware: x86_64 OS: Linux Status: UNCONFIRMED Severity: normal Priority: --- Component: backends Assignee: bugs@openldap.org Reporter: tekkitan@gmail.com Target Milestone: ---
System: t3.small AWS instance 2VCPU 2GB RAM OS: Ubuntu 20.04 (was happening on 16.04 as well) Package Version: 2.4.49+dfsg-2ubuntu1.2
We've been having this weird problem with slapd between Ubuntu 16.04 and Ubuntu 20.04 where slapd will suddenly stop according to syslog but will also generate a crash file within apport. We thought it was due to some weird stuff we were doing in the 16.04 version (2.4.42+dfsg-2ubuntu3.8), but we deployed a new LDAP proxy in 20.04 (2.4.49+dfsg-2ubuntu1.2) which appears to have the same issue so I am reporting it and hoping for guidance as we use this proxy for our corporate VPN.
Below is the backtrace which was the same from both proxy systems, but this one is from the 20.04 version as that is our current system in use:
root@useldap02:~/_usr_sbin_slapd.0.crash# apport-retrace --stdout /var/crash/_usr_sbin_slapd.0.crash --- stack trace --- #0 __GI_raise (sig=sig@entry=6) at ../sysdeps/unix/sysv/linux/raise.c:50 set = {__val = {0, 140488162916432, 140488307736576, 140487911640016, 140487911640117, 140487911640016, 140487911640016, 140487911640147, 140487911640316, 140487911640016, 140487911640316, 0, 0, 0, 0, 0}} pid = <optimized out> tid = <optimized out> ret = <optimized out> #1 0x00007fc5f3043859 in __GI_abort () at abort.c:79 save_stage = 1 act = {__sigaction_handler = {sa_handler = 0x2, sa_sigaction = 0x2}, sa_mask = {__val = {131, 4, 99, 0, 0, 140488164070405, 0, 21474836480, 140488121483504, 0, 140488164102160, 0, 6703301646461552640, 140488164070405, 140488165695488, 140488164087176}}, sa_flags = -235332728, sa_restorer = 0xbf} sigs = {__val = {32, 0 <repeats 15 times>}} #2 0x00007fc5f3043729 in __assert_fail_base (fmt=0x7fc5f31d9588 "%s%s%s:%u: %s%sAssertion `%s' failed.\n%n", assertion=0x7fc5f1f91953 "!LDAP_BACK_CONN_TAINTED( lc )", file=0x7fc5f1f91b88 "../../../../../servers/slapd/back-ldap/bind.c", line=191, function=<optimized out>) at assert.c:92 str = 0x7fc5c410a280 "@Q\021\344\305\177" total = 4096 #3 0x00007fc5f3054f36 in __GI___assert_fail (assertion=0x7fc5f1f91953 "!LDAP_BACK_CONN_TAINTED( lc )", file=0x7fc5f1f91b88 "../../../../../servers/slapd/back-ldap/bind.c", line=191, function=0x7fc5f1f921a0 "ldap_back_conn_delete") at assert.c:101 No locals. #4 0x00007fc5f1f807c7 in ldap_back_conn_delete () from /usr/lib/ldap/back_ldap-2.4.so.2 No symbol table info available. #5 0x00007fc5f1f814a4 in ?? () from /usr/lib/ldap/back_ldap-2.4.so.2 No symbol table info available. #6 0x00007fc5f1f815ad in ldap_back_release_conn_lock () from /usr/lib/ldap/back_ldap-2.4.so.2 No symbol table info available. #7 0x00007fc5f1f833bc in ldap_back_retry () from /usr/lib/ldap/back_ldap-2.4.so.2 No symbol table info available. #8 0x00007fc5f1f7ec0b in ldap_back_search () from /usr/lib/ldap/back_ldap-2.4.so.2 No symbol table info available. #9 0x000055b87105cc88 in overlay_op_walk () No symbol table info available. #10 0x000055b87105cdb7 in ?? () No symbol table info available. #11 0x000055b870fef80d in fe_op_search () No symbol table info available. #12 0x000055b870fef034 in do_search () No symbol table info available. #13 0x000055b870fec6ed in ?? () No symbol table info available. #14 0x000055b870fed22c in ?? () No symbol table info available. #15 0x00007fc5f32e7a03 in ?? () from /lib/x86_64-linux-gnu/libldap_r-2.4.so.2 No symbol table info available. #16 0x00007fc5f3219609 in start_thread (arg=<optimized out>) at pthread_create.c:477 ret = <optimized out> pd = <optimized out> unwind_buf = {cancel_jmp_buf = {{jmp_buf = {140488121493248, -8220430158823943899, 140488129874654, 140488129874655, 140488129874784, 140488121490880, 8241811018110734629, 8241811687867814181}, mask_was_saved = 0}}, priv = {pad = {0x0, 0x0, 0x0, 0x0}, data = {prev = 0x0, cleanup = 0x0, canceltype = 0}}} not_first_call = 0 #17 0x00007fc5f3140103 in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:95 No locals.
This is usually what we see within syslog when slapd stops:
Jul 11 00:49:32 useldap02 slapd[5257]: conn=1035 op=239 SRCH base="ou=people,dc=company,dc=com" scope=2 deref=0 filter="(&(|(objectClass=person)(objectClass=organizationalPerson)(objectClass=inetOrgPerson)(?objectClass=fw1Person))(uid=exampleusername))" Jul 11 00:49:32 useldap02 slapd[5257]: conn=1035 op=239 SRCH attr=cn uid sn mail proxyAddresses userPrincipalName fullName displayName description objectclass fw1hour-range-from fw1hour-range-to fw1expiration-date fw1day fw1allowed-dst fw1allowed-src fw1auth-method userAccountControl fw1userPwdPolicy mobile fw1BadPwdCount fw1lastLoginFailure fw1pwdLastMod fw1auth-server fw1auth-server fw1groupTemplate fw1sr-auth-track fw1enc-methods fw1ISAKMP-EncMethod fw1ISAKMP-AuthMethods fw1ISAKMP-HashMethods fw1ISAKMP-Transform fw1ISAKMP-DataIntegrityMethod fw1ISAKMP-SharedSecret fw1ISAKMP-DataEncMethod givenName surname Jul 11 00:49:32 useldap02 slapd[5257]: conn=1035 op=240 SRCH base="ou=groups,dc=company,dc=com" scope=2 deref=0 filter="(&(|(objectClass=person)(objectClass=organizationalPerson)(objectClass=inetOrgPerson)(?objectClass=fw1Person))(uid=exampleusername))" Jul 11 00:49:32 useldap02 slapd[5257]: conn=1035 op=240 SRCH attr=cn uid sn mail proxyAddresses userPrincipalName fullName displayName description objectclass fw1hour-range-from fw1hour-range-to fw1expiration-date fw1day fw1allowed-dst fw1allowed-src fw1auth-method userAccountControl fw1userPwdPolicy mobile fw1BadPwdCount fw1lastLoginFailure fw1pwdLastMod fw1auth-server fw1auth-server fw1groupTemplate fw1sr-auth-track fw1enc-methods fw1ISAKMP-EncMethod fw1ISAKMP-AuthMethods fw1ISAKMP-HashMethods fw1ISAKMP-Transform fw1ISAKMP-DataIntegrityMethod fw1ISAKMP-SharedSecret fw1ISAKMP-DataEncMethod givenName surname Jul 11 00:49:34 useldap02 slapd[28099]: * Stopping OpenLDAP slapd Jul 11 00:49:34 useldap02 slapd[28099]: ...done. Jul 11 00:49:34 useldap02 systemd[1]: slapd.service: Succeeded.
Note that a lot of these attributes being searched for (all the fw1* attributes) do not exist within our LDAP. Not sure if that would be the cause.
Thank you for any help that can be provided.
https://bugs.openldap.org/show_bug.cgi?id=9288
--- Comment #1 from Quanah Gibson-Mount quanah@openldap.org --- The backtrace shows it hit an assert:
assertion=0x7fc5f1f91953 "!LDAP_BACK_CONN_TAINTED( lc )"
which as the backtrace notes, is from line 191 in bind.c for the ldap backend:
190 if ( LDAP_BACK_CONN_CACHED( lc ) ) { 191 assert( !LDAP_BACK_CONN_TAINTED( lc ) ); 192 tmplc = avl_delete( &li->li_conninfo.lai_tree, (caddr_t)lc, 193 ldap_back_conndnlc_cmp ); 194 assert( tmplc == lc ); 195 LDAP_BACK_CONN_CACHED_CLEAR( lc ); 196 }
Interestingly, this entire function is unchanged since 2007.
https://bugs.openldap.org/show_bug.cgi?id=9288
Quanah Gibson-Mount quanah@openldap.org changed:
What |Removed |Added ---------------------------------------------------------------------------- Summary|slapd service stops |back-ldap triggers assert |suddenly but generates |in bind.c |crash file |
https://bugs.openldap.org/show_bug.cgi?id=9288
Quanah Gibson-Mount quanah@openldap.org changed:
What |Removed |Added ---------------------------------------------------------------------------- Target Milestone|--- |2.5.1 Keywords| |OL_2_5_REQ
https://bugs.openldap.org/show_bug.cgi?id=9288
Quanah Gibson-Mount quanah@openldap.org changed:
What |Removed |Added ---------------------------------------------------------------------------- See Also| |https://bugs.openldap.org/s | |how_bug.cgi?id=9400 Target Milestone|2.5.1 |2.6.0 Keywords|OL_2_5_REQ |
--- Comment #2 from Quanah Gibson-Mount quanah@openldap.org --- May be duplicate of ITS#9400. Please see if this still occurs with OpenLDAP 2.4.57, thanks.
https://bugs.openldap.org/show_bug.cgi?id=9288
--- Comment #3 from tero.saarni@est.tech --- I might have bumped into a variant of this problem. In my case, I have a reproducible test for this, and it reproduces crash even after #9400 is applied.
The assert can be triggered by sending InvalidCredentials response from the remote server after back-ldap retries the connection and binds again.
This bug might possibly related to another bug that I'm observing: back-ldap retries bind as anonymous even if `rebind-as-user` is set to yes. I did not find existing issue for this in bugs.openldap.org yet.
Log file has:
slapd: bind.c:191: ldap_back_conn_delete: Assertion `!LDAP_BACK_CONN_TAINTED( lc )' failed.
And backtrace from gdb:
#0 __GI_raise (sig=sig@entry=6) at ../sysdeps/unix/sysv/linux/raise.c:50 #1 0x00007f587d350859 in __GI_abort () at abort.c:79 #2 0x00007f587d350729 in __assert_fail_base (fmt=0x7f587d4e6588 "%s%s%s:%u: %s%sAssertion `%s' failed.\n%n", assertion=0x55740debf839 "!LDAP_BACK_CONN_TAINTED( lc )", file=0x55740debf7a0 "bind.c", line=191, function=<optimized out>) at assert.c:92 #3 0x00007f587d361f36 in __GI___assert_fail (assertion=0x55740debf839 "!LDAP_BACK_CONN_TAINTED( lc )", file=0x55740debf7a0 "bind.c", line=191, function=0x55740debfff0 <__PRETTY_FUNCTION__.12907> "ldap_back_conn_delete") at assert.c:101 #4 0x000055740dd857a1 in ldap_back_conn_delete (li=0x55740e2a4c90, lc=0x7f586c102fa0) at bind.c:191 #5 0x000055740dd862ad in ldap_back_freeconn (li=0x55740e2a4c90, lc=0x7f586c102fa0, dolock=0) at bind.c:510 #6 0x000055740dd8a795 in ldap_back_retry (lcp=0x7f587ae77810, op=0x7f5870000bb0, rs=0x7f587ae78a30, sendok=LDAP_BACK_DONTSEND) at bind.c:2035 #7 0x000055740dd48efc in ldap_back_search (op=0x7f5870000bb0, rs=0x7f587ae78a30) at search.c:579 #8 0x000055740dc91a31 in fe_op_search (op=0x7f5870000bb0, rs=0x7f587ae78a30) at search.c:406 #9 0x000055740dc91229 in do_search (op=0x7f5870000bb0, rs=0x7f587ae78a30) at search.c:247 #10 0x000055740dc8d6a7 in connection_operation (ctx=0x7f587ae78b90, arg_v=0x7f5870000bb0) at connection.c:1168 #11 0x000055740dc8ddd4 in connection_read_thread (ctx=0x7f587ae78b90, argv=0xb) at connection.c:1319 #12 0x000055740de82e10 in ldap_int_thread_pool_wrapper (xpool=0x55740e243540) at tpool.c:1051 #13 0x00007f587d526609 in start_thread (arg=<optimized out>) at pthread_create.c:477 #14 0x00007f587d44d293 in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:95
https://bugs.openldap.org/show_bug.cgi?id=9288
--- Comment #4 from Ondřej Kuzník ondra@mistotebe.net --- On Tue, Feb 09, 2021 at 07:38:23AM +0000, openldap-its@openldap.org wrote:
I might have bumped into a variant of this problem. In my case, I have a reproducible test for this, and it reproduces crash even after #9400 is applied.
Hi Tero, thanks for investigating this, would you be able to provide the test script here or propose it as an addition to tests/data/regressions?
The assert can be triggered by sending InvalidCredentials response from the remote server after back-ldap retries the connection and binds again.
This bug might possibly related to another bug that I'm observing: back-ldap retries bind as anonymous even if `rebind-as-user` is set to yes. I did not find existing issue for this in bugs.openldap.org yet.
I haven't been able to find a bug referencing this directly, but seem to remember there was a suggestion there were issues reconnecting, maybe a part of a larger issue that manifested as ITS#9205 and ITS#8427?
Thanks,
https://bugs.openldap.org/show_bug.cgi?id=9288
Quanah Gibson-Mount quanah@openldap.org changed:
What |Removed |Added ---------------------------------------------------------------------------- Keywords| |OL_2_5_REQ Target Milestone|2.6.0 |2.5.2 Severity|normal |blocker
https://bugs.openldap.org/show_bug.cgi?id=9288
Quanah Gibson-Mount quanah@openldap.org changed:
What |Removed |Added ---------------------------------------------------------------------------- Status|UNCONFIRMED |CONFIRMED Ever confirmed|0 |1
--- Comment #5 from Quanah Gibson-Mount quanah@openldap.org --- Test case MR: https://git.openldap.org/openldap/openldap/-/merge_requests/223
https://bugs.openldap.org/show_bug.cgi?id=9288
--- Comment #6 from tero.saarni@est.tech --- Here is a notice of origin and rights statement for the merge request
The attached patch file is derived from OpenLDAP Software. All of the modifications to OpenLDAP Software represented in the following patch(es) were developed by Tero Saarni tero.saarni@est.tech. I have not assigned rights and/or interest in this work to any party.
Ericsson Software Technology AB hereby place the following modifications to OpenLDAP Software (and only these modifications) into the public domain. Hence, these modifications may be freely used and/or redistributed for any purpose with or without attribution and/or other notice.
https://bugs.openldap.org/show_bug.cgi?id=9288
Quanah Gibson-Mount quanah@openldap.org changed:
What |Removed |Added ---------------------------------------------------------------------------- Assignee|bugs@openldap.org |ondra@mistotebe.net
https://bugs.openldap.org/show_bug.cgi?id=9288
Quanah Gibson-Mount quanah@openldap.org changed:
What |Removed |Added ---------------------------------------------------------------------------- Target Milestone|2.5.2 |2.5.3
https://bugs.openldap.org/show_bug.cgi?id=9288
Quanah Gibson-Mount quanah@openldap.org changed:
What |Removed |Added ---------------------------------------------------------------------------- Resolution|--- |FIXED Keywords|OL_2_5_REQ | Status|CONFIRMED |RESOLVED
--- Comment #7 from Quanah Gibson-Mount quanah@openldap.org --- Commits: • 1e3e6cdd by Tero Saarni at 2021-03-02T20:00:55+00:00 ITS#9288 crash if back-ldap rebind fails
• 99efeda0 by Ondřej Kuzník at 2021-03-02T20:00:55+00:00 ITS#9288 Do no change tainted status on failed retry
It seems refcnt == 0 connections are tainted only if they were just removed from cache.
https://bugs.openldap.org/show_bug.cgi?id=9288
Quanah Gibson-Mount quanah@openldap.org changed:
What |Removed |Added ---------------------------------------------------------------------------- Status|RESOLVED |VERIFIED