https://bugs.openldap.org/show_bug.cgi?id=9210
Bug ID: 9210 Summary: [with patch] Infinite retry-loop (and thus 100% CPU-Usage) when lots of requests are issued Product: OpenLDAP Version: 2.4.47 Hardware: All OS: All Status: UNCONFIRMED Severity: normal Priority: --- Component: libraries Assignee: bugs@openldap.org Reporter: lukas.juhrich@agdsn.de Target Milestone: ---
Created attachment 706 --> https://bugs.openldap.org/attachment.cgi?id=706&action=edit Patch adding errno resets
*tl;dr* single-stepping revealed a missing `errno` reset in `ber_int_sb_write`s retry loop.
An sssd-setup of ours, which we use for basic-auth on one of our services, issues ldap calls. When under load, i.e. when many `ldap_search_ext` calls had to be issued due to many requests, we observed that the corresponding process/thread went up to 100% CPU usage and stayed there.
- This was the [flamegraph](https://helios.wh2.tu-dresden.de/~shreyder/sssd_be%20--domain%20dom-http-wik...), where you can see that it was stuck below `ber_int_sb_write`. - Single-Stepping with GDB revealed that we are stuck in the `for(;;)`-Retry-loop. Indeed, we could observe that the `sbi_write` was successful, but the `errno` continued to be `EINTR` every time I hit that breakpoint. - Patching `sockbuf.c` as attached and rebuilding resolved the issue.
I also noticed similar sections with such a loop in `sockbuf.c` and added `errno = 0;` at the beginning of each iteration. In principle, they should suffer from the same problem.
The reasoning for why this happened under load is that with many requests being issued, the probability that the write happens when the process gets an _actual_ interrupt is much higher, and once that happens, we're stuck in the infinite loop.