https://bugs.openldap.org/show_bug.cgi?id=8650
Quanah Gibson-Mount <quanah(a)openldap.org> changed:
What |Removed |Added
----------------------------------------------------------------------------
Resolution|TEST |FIXED
Keywords|OL_2_5_REQ |
Target Milestone|2.5.0 |2.4.50
--
You are receiving this mail because:
You are on the CC list for the bug.
https://bugs.openldap.org/show_bug.cgi?id=9210
Bug ID: 9210
Summary: [with patch] Infinite retry-loop (and thus 100%
CPU-Usage) when lots of requests are issued
Product: OpenLDAP
Version: 2.4.47
Hardware: All
OS: All
Status: UNCONFIRMED
Severity: normal
Priority: ---
Component: libraries
Assignee: bugs(a)openldap.org
Reporter: lukas.juhrich(a)agdsn.de
Target Milestone: ---
Created attachment 706
--> https://bugs.openldap.org/attachment.cgi?id=706&action=edit
Patch adding errno resets
*tl;dr* single-stepping revealed a missing `errno` reset in `ber_int_sb_write`s
retry loop.
An sssd-setup of ours, which we use for basic-auth on one of our services,
issues ldap calls. When under load, i.e. when many `ldap_search_ext` calls had
to be issued due to many requests, we observed that the corresponding
process/thread went up to 100% CPU usage and stayed there.
- This was the
[flamegraph](https://helios.wh2.tu-dresden.de/~shreyder/sssd_be%20--domain%20dom-http-wiki.svg),
where you can see that it was stuck below `ber_int_sb_write`.
- Single-Stepping with GDB revealed that we are stuck in the
`for(;;)`-Retry-loop. Indeed, we could observe that the `sbi_write` was
successful, but the `errno` continued to be `EINTR` every time I hit that
breakpoint.
- Patching `sockbuf.c` as attached and rebuilding resolved the issue.
I also noticed similar sections with such a loop in `sockbuf.c` and added
`errno = 0;` at the beginning of each iteration. In principle, they should
suffer from the same problem.
The reasoning for why this happened under load is that with many requests being
issued, the probability that the write happens when the process gets an
_actual_ interrupt is much higher, and once that happens, we're stuck in the
infinite loop.
--
You are receiving this mail because:
You are on the CC list for the bug.
https://bugs.openldap.org/show_bug.cgi?id=8650
Howard Chu <hyc(a)openldap.org> changed:
What |Removed |Added
----------------------------------------------------------------------------
Status|CONFIRMED |RESOLVED
Resolution|--- |TEST
--- Comment #18 from Howard Chu <hyc(a)openldap.org> ---
Commits:
• 735e1ab
by Howard Chu at 2020-04-12T22:18:51+00:00
ITS#8650 loop on incomplete TLS handshake
--
You are receiving this mail because:
You are on the CC list for the bug.
https://bugs.openldap.org/show_bug.cgi?id=8650
--- Comment #17 from Ryan Tandy <ryan(a)openldap.org> ---
Created attachment 708
--> https://bugs.openldap.org/attachment.cgi?id=708&action=edit
test program with non-blocking socket
Here's a test program that exercises the scenario with a non-blocking socket,
similar to the case described in bug 9210. Currently it fails on 2.4 with
LDAP_SERVER_DOWN and on 2.5 with LDAP_TIMEOUT, but succeeds if you comment out
the fcntl(). Any patch needs to correct that as well as the scenario described
here with a blocking socket.
--
You are receiving this mail because:
You are on the CC list for the bug.
https://bugs.openldap.org/show_bug.cgi?id=8650
--- Comment #16 from Ryan Tandy <ryan(a)openldap.org> ---
*** Bug 9210 has been marked as a duplicate of this bug. ***
--
You are receiving this mail because:
You are on the CC list for the bug.
https://bugs.openldap.org/show_bug.cgi?id=8650
Ryan Tandy <ryan(a)openldap.org> changed:
What |Removed |Added
----------------------------------------------------------------------------
Status|UNCONFIRMED |CONFIRMED
Ever confirmed|0 |1
--- Comment #15 from Ryan Tandy <ryan(a)openldap.org> ---
The other way we can get a non-blocking socket is if the client set one up
itself and gave it to us via ldap_init_fd(). sssd does this, or used to: bug
9210.
--
You are receiving this mail because:
You are on the CC list for the bug.
https://bugs.openldap.org/show_bug.cgi?id=8650
Ryan Tandy <ryan(a)openldap.org> changed:
What |Removed |Added
----------------------------------------------------------------------------
CC| |lukas.juhrich(a)agdsn.de
--- Comment #14 from Ryan Tandy <ryan(a)openldap.org> ---
*** Bug 9210 has been marked as a duplicate of this bug. ***
--
You are receiving this mail because:
You are on the CC list for the bug.
https://bugs.openldap.org/show_bug.cgi?id=8650
Ryan Tandy <ryan(a)openldap.org> changed:
What |Removed |Added
----------------------------------------------------------------------------
See Also| |https://bugs.openldap.org/s
| |how_bug.cgi?id=9210
--
You are receiving this mail because:
You are on the CC list for the bug.
https://bugs.openldap.org/show_bug.cgi?id=8847
--- Comment #31 from Ryan Tandy <ryan(a)openldap.org> ---
Hello, I have been reviewing and testing this patch and I think that there are
a number of issues, some less severe and some more, that should still be
addressed.
In general the patch does not seem well adapted to the surrounding code. For
example things have been added at random positions in lists that previously
were sorted, and the whitespace style (and code style generally) are quite
different from the existing code. Also, the new code does not seem to respect
the configure option (and #ifdefs etc) for disabling IPv6 support.
doc/man/man3/ldap_get_option.3:
- LDAP_OPT_SOCKET_BIND_ADDRESSES added at the wrong place
doc/man/man5/ldap.conf.5:
- SOCKET_BIND_ADDRESSES added at the wrong place
- typo (seperated -> separated)
libraries/libldap/ldap-int.h:
- /* pull in netinet/in */ is a useless comment
- fails to compile under MinGW (there is no netinet/in.h header)
-> I could be wrong but 'struct in_addr' feels rather low-level for this
file?
but I'm not sure what a better design would look like...
- should not include IPv6 bits if IPv6 disabled
- LDAP_LDO_NULLARG has not been updated (gcc generates a warning)
- if ITS#6567 is finished before this one, MAX_LDAP_ADDR_LEN will probably need
an update ("GSSAPI_ALLOW_REMOTE_PRINCIPAL" is longer than
"SOCKET_BIND_ADDRESSES" is longer than "TLS_CIPHER_SUITE")
libraries/libldap/options.c:
- in ldap_set_option: other options reset to default when invalue == NULL, it
would be nice if this would do the same
- ldap_validate_and_fill_sourceip feels a bit weird again, there are no other
similar functions in this file... maybe os-ip.c or util-int.c?
- in the existing code, inet_pton is only used if LDAP_PF_INET6; should
probably
follow that pattern (there is also HAVE_INET_NTOP...)
libraries/libldap/os-ip.c:
- possibly the new code should be in ldap_int_prepare_socket()? not sure...
- address family mismatch (only one bind address specified and socket uses the
other family) ignored; should we try to catch it?
-> MS implementation returns LDAP_SERVER_DOWN when this happens
--
You are receiving this mail because:
You are on the CC list for the bug.
https://bugs.openldap.org/show_bug.cgi?id=6567
Quanah Gibson-Mount <quanah(a)openldap.org> changed:
What |Removed |Added
----------------------------------------------------------------------------
Status|CONFIRMED |IN_PROGRESS
--
You are receiving this mail because:
You are on the CC list for the bug.