https://bugs.openldap.org/show_bug.cgi?id=10141
Issue ID: 10141 Summary: 100% CPU consumption with ldap_int_tls_connect Product: OpenLDAP Version: 2.6.3 Hardware: Other OS: Linux Status: UNCONFIRMED Keywords: needs_review Severity: normal Priority: --- Component: libraries Assignee: bugs@openldap.org Reporter: vivekanand754@gmail.com Target Milestone: ---
While doing secure ldap connection, i'm seeing that connection is getting stuck in read block in case it is unable to connect active directory sometime: ~ # strace -p 15049 strace: Process 15049 attached read(3, 0x55ef720bda53, 5) = -1 EAGAIN (Resource temporarily unavailable) read(3, 0x55ef720bda53, 5) = -1 EAGAIN (Resource temporarily unavailable) .. .. .. ..
After putting some logs, I can see that "ldap_int_tls_start" function of "openldap-2.6.3/libraries/libldap/tls2.c" calls "ldap_int_tls_connect" in while loop. It seems to be blocking call, as it try to connect continuously until it get connected(ti_session_connect returns 0) and thus consumes 100% CPU during that time.
Is there any known issue ?
https://bugs.openldap.org/show_bug.cgi?id=10141
--- Comment #1 from Quanah Gibson-Mount quanah@openldap.org --- There is not enough detail here to be actionable. Please provided further details. Is this sync or async? What TLS library is in use? etc
https://bugs.openldap.org/show_bug.cgi?id=10141
Quanah Gibson-Mount quanah@openldap.org changed:
What |Removed |Added ---------------------------------------------------------------------------- Keywords|needs_review |
https://bugs.openldap.org/show_bug.cgi?id=10141
--- Comment #2 from Vivek Anand vivekanand754@gmail.com --- This issue is coming in sync. And using "openssl(--with-tls=openssl)" library.
https://bugs.openldap.org/show_bug.cgi?id=10141
--- Comment #3 from Vivek Anand vivekanand754@gmail.com --- Created attachment 999 --> https://bugs.openldap.org/attachment.cgi?id=999&action=edit wait for 5ms and then retry after ldap_int_tls_connect failure
If ldap_int_tls_connect fails for sync flow, added a sleep of 5ms before retrying again. This is resolving the issue and now I'm seeing high CPU consumption as the loop frequency got reduced. Attached patch for the same (tls-reconnect-delay.patch)
Let me know if I can go with this minor change. Hope this change will not impact anything else. Let me know if this change is not recommended and if there is any other way to resolve this issue.
https://bugs.openldap.org/show_bug.cgi?id=10141
--- Comment #4 from Quanah Gibson-Mount quanah@openldap.org --- (In reply to Vivek Anand from comment #3)
Created attachment 999 [details] wait for 5ms and then retry after ldap_int_tls_connect failure
You need to provide concrete details and code examples of what you are doing. This patch cannot be accepted since it is just a workaround, not a solution to an actual problem.
--- Comment #5 from Quanah Gibson-Mount quanah@openldap.org --- (In reply to Quanah Gibson-Mount from comment #4)
(In reply to Vivek Anand from comment #3)
Created attachment 999 [details] wait for 5ms and then retry after ldap_int_tls_connect failure
You need to provide concrete details and code examples of what you are doing. This patch cannot be accepted since it is just a workaround, not a solution to an actual problem.
This didn't got out via email due to a bug in Bugzilla exposed by today's maintenance. Please note the above.
https://bugs.openldap.org/show_bug.cgi?id=10141
--- Comment #6 from Vivek Anand vivekanand754@gmail.com --- Created attachment 1002 --> https://bugs.openldap.org/attachment.cgi?id=1002&action=edit set async mode
Basically I'm running a script which is hitting GET api continuously which is authenticating via ldap. The flow is like (MyApp --> Linux-PAM --> nss_ldap --> openldap)
Sometimes if ldap server is not reachable, the script hangs as it get stuck in while loop which is continuously hitting "ldap_int_tls_connect" ("ldap_int_tls_start" function of "openldap-2.6.3/libraries/libldap/tls2.c"). During this time MyApp consumes 100% CPU and remains there about 16~17min. After that connection gets terminated and CPU comes back to normal.
In order to fix this problem, I tried below 2 approaches: 1) introduced a sleep of 50ms to reduce while loop frequency (sync mode of operation): this reduced CPU consumption but process remain stuck for 16~17 min and got released after that 2) set async mode of operation (using LDAP_BOOL_SET in openldap-2.6.3/libraries/libldap/init.c) : Got similiar result as approach 1
Both of above 2 approach reduced CPU consumption of MyApp from ~25%(ideal scenario) to ~1.3% and with that I'm not able to hit 100% cpu with api load.
I have query as below: 1) How to cater this issue for sync mode of operation. Is there any timeout parameter which we can configure if it's unable to connect ldap server, then it should come out of while loop after configured timeout ? 2) Is there any way to set async mode via any configuration?
https://bugs.openldap.org/show_bug.cgi?id=10141
--- Comment #7 from Vivek Anand vivekanand754@gmail.com --- Hi Team,
Is there any update on this? Do let me know if there is any other faster channel for communication.
-Thanks
https://bugs.openldap.org/show_bug.cgi?id=10141
--- Comment #8 from Quanah Gibson-Mount quanah@openldap.org --- (In reply to Vivek Anand from comment #7)
Hi Team,
Is there any update on this? Do let me know if there is any other faster channel for communication.
Hello,
Based on your description, you're opening a massive number of connections which is causing OpenSSL to run out of entropy, and why adding the sleep works for you. This appears to be an abuse of the library. If the former is correct, then the question is why are you opening a massive number of connections instead of simply using a small number of connections to do multiple queries.
https://bugs.openldap.org/show_bug.cgi?id=10141
--- Comment #9 from Vivek Anand vivekanand754@gmail.com --- Just to clarify, I'm not opening massive number of connections. The API load is sequential like below: ``` while [ 1 ] do curl -v -k -u user:"password" --request GET "https://x.x.x.x/api/xyz" sleep 1 done ```
So, only one application thread will be spwaned at a time. It will be then processed and then released. At certain point, if ldap server is not reachable, the current application thread available at that moment gets stuck and comsumes 100%
Please do let me know your recommendation regarding this issue.
Also, It would be great if you can help with previous query also "Is there any way to set async mode via any configuration?"
https://bugs.openldap.org/show_bug.cgi?id=10141
--- Comment #10 from Vivek Anand vivekanand754@gmail.com --- Hi Team,
Any help with above query?
-Thanks