[Issue 10141] New: 100% CPU consumption with ldap_int_tls_connect

List overview All Threads
Download

newer

older

[Issue 10281] New: Update...

[Issue 8047] TIMEOUT and...

openldap-its＠openldap.org

11 Dec 2023 11 Dec '23

1:52 p.m.

https://bugs.openldap.org/show_bug.cgi?id=10141

Issue ID: 10141 Summary: 100% CPU consumption with ldap_int_tls_connect Product: OpenLDAP Version: 2.6.3 Hardware: Other OS: Linux Status: UNCONFIRMED Keywords: needs_review Severity: normal Priority: --- Component: libraries Assignee: bugs@openldap.org Reporter: vivekanand754@gmail.com Target Milestone: ---

While doing secure ldap connection, i'm seeing that connection is getting stuck in read block in case it is unable to connect active directory sometime: ~ # strace -p 15049 strace: Process 15049 attached read(3, 0x55ef720bda53, 5) = -1 EAGAIN (Resource temporarily unavailable) read(3, 0x55ef720bda53, 5) = -1 EAGAIN (Resource temporarily unavailable) .. .. .. ..

After putting some logs, I can see that "ldap_int_tls_start" function of "openldap-2.6.3/libraries/libldap/tls2.c" calls "ldap_int_tls_connect" in while loop. It seems to be blocking call, as it try to connect continuously until it get connected(ti_session_connect returns 0) and thus consumes 100% CPU during that time.

Is there any known issue ?

-- You are receiving this mail because: You are on the CC list for the issue.

Show replies by date

openldap-its＠openldap.org

12 Dec 12 Dec

5:39 p.m.

New subject: [Issue 10141] 100% CPU consumption with ldap_int_tls_connect

https://bugs.openldap.org/show_bug.cgi?id=10141

--- Comment #1 from Quanah Gibson-Mount quanah@openldap.org --- There is not enough detail here to be actionable. Please provided further details. Is this sync or async? What TLS library is in use? etc

-- You are receiving this mail because: You are on the CC list for the issue.

openldap-its＠openldap.org

5:39 p.m.

New subject: [Issue 10141] 100% CPU consumption with ldap_int_tls_connect

https://bugs.openldap.org/show_bug.cgi?id=10141

Quanah Gibson-Mount quanah@openldap.org changed:

What |Removed |Added ---------------------------------------------------------------------------- Keywords|needs_review |

-- You are receiving this mail because: You are on the CC list for the issue.

openldap-its＠openldap.org

13 Dec 13 Dec

1:05 p.m.

New subject: [Issue 10141] 100% CPU consumption with ldap_int_tls_connect

https://bugs.openldap.org/show_bug.cgi?id=10141

--- Comment #2 from Vivek Anand vivekanand754@gmail.com --- This issue is coming in sync. And using "openssl(--with-tls=openssl)" library.

-- You are receiving this mail because: You are on the CC list for the issue.

openldap-its＠openldap.org

10 Jan 10 Jan

7:26 a.m.

New subject: [Issue 10141] 100% CPU consumption with ldap_int_tls_connect

https://bugs.openldap.org/show_bug.cgi?id=10141

--- Comment #3 from Vivek Anand vivekanand754@gmail.com --- Created attachment 999 --> https://bugs.openldap.org/attachment.cgi?id=999&action=edit wait for 5ms and then retry after ldap_int_tls_connect failure

If ldap_int_tls_connect fails for sync flow, added a sleep of 5ms before retrying again. This is resolving the issue and now I'm seeing high CPU consumption as the loop frequency got reduced. Attached patch for the same (tls-reconnect-delay.patch)

Let me know if I can go with this minor change. Hope this change will not impact anything else. Let me know if this change is not recommended and if there is any other way to resolve this issue.

-- You are receiving this mail because: You are on the CC list for the issue.

openldap-its＠openldap.org

18 Jan 18 Jan

5:50 p.m.

New subject: [Issue 10141] 100% CPU consumption with ldap_int_tls_connect

https://bugs.openldap.org/show_bug.cgi?id=10141

--- Comment #4 from Quanah Gibson-Mount quanah@openldap.org --- (In reply to Vivek Anand from comment #3)

...

Created attachment 999 [details] wait for 5ms and then retry after ldap_int_tls_connect failure

You need to provide concrete details and code examples of what you are doing. This patch cannot be accepted since it is just a workaround, not a solution to an actual problem.

--- Comment #5 from Quanah Gibson-Mount quanah@openldap.org --- (In reply to Quanah Gibson-Mount from comment #4)

...

(In reply to Vivek Anand from comment #3)

...
Created attachment 999 [details] wait for 5ms and then retry after ldap_int_tls_connect failure

You need to provide concrete details and code examples of what you are doing. This patch cannot be accepted since it is just a workaround, not a solution to an actual problem.

This didn't got out via email due to a bug in Bugzilla exposed by today's maintenance. Please note the above.

-- You are receiving this mail because: You are on the CC list for the issue.

openldap-its＠openldap.org

24 Jan 24 Jan

7:39 a.m.

New subject: [Issue 10141] 100% CPU consumption with ldap_int_tls_connect

https://bugs.openldap.org/show_bug.cgi?id=10141

--- Comment #6 from Vivek Anand vivekanand754@gmail.com --- Created attachment 1002 --> https://bugs.openldap.org/attachment.cgi?id=1002&action=edit set async mode

Basically I'm running a script which is hitting GET api continuously which is authenticating via ldap. The flow is like (MyApp --> Linux-PAM --> nss_ldap --> openldap)

Sometimes if ldap server is not reachable, the script hangs as it get stuck in while loop which is continuously hitting "ldap_int_tls_connect" ("ldap_int_tls_start" function of "openldap-2.6.3/libraries/libldap/tls2.c"). During this time MyApp consumes 100% CPU and remains there about 16~17min. After that connection gets terminated and CPU comes back to normal.

In order to fix this problem, I tried below 2 approaches: 1) introduced a sleep of 50ms to reduce while loop frequency (sync mode of operation): this reduced CPU consumption but process remain stuck for 16~17 min and got released after that 2) set async mode of operation (using LDAP_BOOL_SET in openldap-2.6.3/libraries/libldap/init.c) : Got similiar result as approach 1

Both of above 2 approach reduced CPU consumption of MyApp from ~25%(ideal scenario) to ~1.3% and with that I'm not able to hit 100% cpu with api load.

I have query as below: 1) How to cater this issue for sync mode of operation. Is there any timeout parameter which we can configure if it's unable to connect ldap server, then it should come out of while loop after configured timeout ? 2) Is there any way to set async mode via any configuration?

-- You are receiving this mail because: You are on the CC list for the issue.

openldap-its＠openldap.org

2 Feb 2 Feb

5:20 a.m.

New subject: [Issue 10141] 100% CPU consumption with ldap_int_tls_connect

https://bugs.openldap.org/show_bug.cgi?id=10141

--- Comment #7 from Vivek Anand vivekanand754@gmail.com --- Hi Team,

Is there any update on this? Do let me know if there is any other faster channel for communication.

-Thanks

-- You are receiving this mail because: You are on the CC list for the issue.

openldap-its＠openldap.org

6 Feb 6 Feb

5:37 p.m.

New subject: [Issue 10141] 100% CPU consumption with ldap_int_tls_connect

https://bugs.openldap.org/show_bug.cgi?id=10141

--- Comment #8 from Quanah Gibson-Mount quanah@openldap.org --- (In reply to Vivek Anand from comment #7)

...

Hi Team,

Is there any update on this? Do let me know if there is any other faster channel for communication.

Hello,

Based on your description, you're opening a massive number of connections which is causing OpenSSL to run out of entropy, and why adding the sleep works for you. This appears to be an abuse of the library. If the former is correct, then the question is why are you opening a massive number of connections instead of simply using a small number of connections to do multiple queries.

-- You are receiving this mail because: You are on the CC list for the issue.

openldap-its＠openldap.org

8 Feb 8 Feb

10:52 a.m.

New subject: [Issue 10141] 100% CPU consumption with ldap_int_tls_connect

https://bugs.openldap.org/show_bug.cgi?id=10141

--- Comment #9 from Vivek Anand vivekanand754@gmail.com --- Just to clarify, I'm not opening massive number of connections. The API load is sequential like below: ``` while [ 1 ] do curl -v -k -u user:"password" --request GET "https://x.x.x.x/api/xyz" sleep 1 done ```

So, only one application thread will be spwaned at a time. It will be then processed and then released. At certain point, if ldap server is not reachable, the current application thread available at that moment gets stuck and comsumes 100%

Please do let me know your recommendation regarding this issue.

Also, It would be great if you can help with previous query also "Is there any way to set async mode via any configuration?"

-- You are receiving this mail because: You are on the CC list for the issue.

openldap-its＠openldap.org

20 Feb 20 Feb

5 a.m.

New subject: [Issue 10141] 100% CPU consumption with ldap_int_tls_connect

https://bugs.openldap.org/show_bug.cgi?id=10141

--- Comment #10 from Vivek Anand vivekanand754@gmail.com --- Hi Team,

Any help with above query?

-Thanks

-- You are receiving this mail because: You are on the CC list for the issue.

openldap-its＠openldap.org

17 Oct 17 Oct

9:21 p.m.

New subject: [Issue 10141] 100% CPU consumption with ldap_int_tls_connect

https://bugs.openldap.org/show_bug.cgi?id=10141

--- Comment #11 from maxime.besson@worteks.com maxime.besson@worteks.com --- Hi Quanah, I was able to reproduce this issue (or a very similar one) easily while investigating a production outage

* Start slapd somewhere (any version should work) * pkill -STOP slapd (will freeze the slapd service, simulating an unresponsive LDAP server, but still allowing TCP connections to succeed) * strace ldapsearch -H ldaps://slapd -d 1

(reports a single read() syscall hanging because the server is stuck)

* strace ldapsearch -H ldaps://slapd -d 1 -o network_timeout=5

(reports rapid-fire read() syscalls on the nonblocking socket)

I was able to reproduce this on the 2.6 branch with OpenSSL as well as a recent Debian system (2.5 branch + GnuTLS)

-- You are receiving this mail because: You are on the CC list for the issue.

openldap-its＠openldap.org

22 Oct 22 Oct

11:20 a.m.

New subject: [Issue 10141] 100% CPU consumption with ldap_int_tls_connect

https://bugs.openldap.org/show_bug.cgi?id=10141

--- Comment #12 from maxime.besson@worteks.com maxime.besson@worteks.com --- Created attachment 1035 --> https://bugs.openldap.org/attachment.cgi?id=1035&action=edit prevent busyloop when socket was set to nonblocking by network_timeout option

I believe this patch might fix the issue by polling the socket before attempting a read. This is previously only performed in async mode. But setting a timeout also causes the socket to be nonblocking.

I renamed the async variable for clarity

-- You are receiving this mail because: You are on the CC list for the issue.

openldap-its＠openldap.org

12:29 p.m.

New subject: [Issue 10141] 100% CPU consumption with ldap_int_tls_connect

https://bugs.openldap.org/show_bug.cgi?id=10141

--- Comment #13 from Ondřej Kuzník ondra@mistotebe.net --- On Tue, Oct 22, 2024 at 09:20:22AM +0000, openldap-its@openldap.org wrote:

...

I believe this patch might fix the issue by polling the socket before attempting a read. This is previously only performed in async mode. But setting a timeout also causes the socket to be nonblocking.

I renamed the async variable for clarity

Hi Maxime, thanks for the extra information and a proposed patch. Can you test the patch in MR!727 I created yesterday? It overlaps what you've just submitted and I believe is a more correct approach.

Thanks,

-- You are receiving this mail because: You are on the CC list for the issue.

openldap-its＠openldap.org

3:13 p.m.

New subject: [Issue 10141] 100% CPU consumption with ldap_int_tls_connect

https://bugs.openldap.org/show_bug.cgi?id=10141

--- Comment #14 from maxime.besson@worteks.com maxime.besson@worteks.com --- Hi Ondřej, MR!727 correctly polls the socket instead of looping on read() syscalls when server is unresponsive during TLS handshake, thanks again!

-- You are receiving this mail because: You are on the CC list for the issue.

openldap-its＠openldap.org

12 Nov 12 Nov

7:11 p.m.

New subject: [Issue 10141] 100% CPU consumption with ldap_int_tls_connect

https://bugs.openldap.org/show_bug.cgi?id=10141

Quanah Gibson-Mount quanah@openldap.org changed:

What |Removed |Added ---------------------------------------------------------------------------- Assignee|bugs@openldap.org |ondra@mistotebe.net

-- You are receiving this mail because: You are on the CC list for the issue.

openldap-its＠openldap.org

7:12 p.m.

New subject: [Issue 10141] 100% CPU consumption with ldap_int_tls_connect

https://bugs.openldap.org/show_bug.cgi?id=10141

Quanah Gibson-Mount quanah@openldap.org changed:

--- Comment #15 from Quanah Gibson-Mount quanah@openldap.org ---

*** This issue has been marked as a duplicate of issue 8047 ***

-- You are receiving this mail because: You are on the CC list for the issue.

openldap-its＠openldap.org

7:12 p.m.

New subject: [Issue 10141] 100% CPU consumption with ldap_int_tls_connect

https://bugs.openldap.org/show_bug.cgi?id=10141

Quanah Gibson-Mount quanah@openldap.org changed:

What |Removed |Added ---------------------------------------------------------------------------- Status|RESOLVED |VERIFIED

-- You are receiving this mail because: You are on the CC list for the issue.

233

Age (days ago)

570

Last active (days ago)

openldap-bugs@openldap.org

17 comments

1 participants

tags (0)

participants (1)

openldap-its＠openldap.org