https://bugs.openldap.org/show_bug.cgi?id=10099
Issue ID: 10099 Summary: OpenLDAP version 2.5 & 2.6 causes IP connectivity to break and breaks basic commands like reboot Product: OpenLDAP Version: 2.5.16 Hardware: x86_64 OS: Linux Status: UNCONFIRMED Keywords: needs_review Severity: normal Priority: --- Component: libraries Assignee: bugs@openldap.org Reporter: amcwongahey@rbbn.com Target Milestone: ---
Created attachment 980 --> https://bugs.openldap.org/attachment.cgi?id=980&action=edit The package Makefile
I am upgrading openLDAP from version 2.4.59 to 2.5.16 and am running into show stopper issues.
In my environment I am running CLIENT mode only (libldap).
I have tried 2.5.16 with the following combinations:
openSSL version 1.1.1s and 3.0.8 Kernel versions: 5.4.92, 4.19.192 and 2.6.32
Problems described below ONLY happens when connecting with a domain controller using LDAPS - does NOT happen with LDAP (non-secure).
When I use ANY combination that includes kernel version 4 or 5 along with openLDAP 2.5.16 I get random lockups to the point where IP connectivity breaks into and out of the node. And also it is so completely hosed that even issuing a reboot command from the console completely hangs and does not restart the node.
The problem happens roughly 50% of the time with openLDAP combined with version 5 kernel but happens noticeably less frequently with the version 4 kernel.
As soon as I kill the process that invokes the connection with openLDAP the problem clears up.
I invoke the connection with the following function call:
nReturnCode = ldap_sasl_bind( m_pLD, m_ADBind.GetBindDN(), LDAP_SASL_SIMPLE, &stPassword, NULL, NULL, &nMsgID);
I use simple auth simply because the entire connection is secured with TLS anyway and there is another functional reason which I cannot go into details on.
OpenLDAP never returns from the ldap_sasl_bind function call. It hangs somewhere inside the library but that alone cannot account for the complete lockup where basic commands like reboot, etc do not work and where all IP connectivity breaks. It seems it has to be something with openLDAP and the Linux kernel combined that triggers this issue.
I am hoping that someone who is much more familiar with the libldap part of the library will pick up on this and be able to determine how to fix this.
As an FYI: I also tried the very first version of 2.5.1 (alpha release) and the latest 2.6 and the problem happens on those versions as well.
To be clear the problem does NOT happen if I run openLDAP 2.5.16 with Linux kernel version 2.6.32.
ADDITIONALLY ALL openSSL & kernel combinations works with openLDAP version 2.4.59!
I am attaching the package Makefile to this report. Below is the ldap.conf contents:
TLS_REQCERT never TLS_KEY /tmp/ssl/certs/server.pem TLS_CERT /tmp/ssl/certs/server.pem TLS_PROTOCOL_MIN 3.1 sasl_secprops maxssf=0
https://bugs.openldap.org/show_bug.cgi?id=10099
--- Comment #1 from AllenM amcwongahey@rbbn.com --- Found that in tls2.c & tls_o.c the following compile flag was removed:
LDAP_USE_NON_BLOCKING_TLS
This was NOT compiled in on 2.4.59 but with the compile flag being removed it started getting compiled. This resulted in the lockup I described.
Now this may well be something in the way our application is interacting with openLDAP. Perhaps we need to do something different to support asynchronous operations.
I did add this to the code when binding with the domain controller:
nReturnCode = ldap_set_option(m_pLD, LDAP_OPT_CONNECT_ASYNC, LDAP_OPT_OFF);
This did not fix the problem but when I added the compile flags back to match that present on 2.4.59 the problem was resolved.
I don't really have the time at the moment to dig further unfortunately.
https://bugs.openldap.org/show_bug.cgi?id=10099
Howard Chu hyc@openldap.org changed:
What |Removed |Added ---------------------------------------------------------------------------- Resolution|--- |FEEDBACK Status|UNCONFIRMED |RESOLVED
--- Comment #2 from Howard Chu hyc@openldap.org --- What is the process that was hanging that you killed, that got the system unfrozen?
https://bugs.openldap.org/show_bug.cgi?id=10099
--- Comment #3 from AllenM amcwongahey@rbbn.com --- It was the user process. But it was hanging inside the openLDAP library right after calling ldap_sasl_bind.
As I stated before it never came back from this function call and that is clearly inside openLDAP.
I added tons of debug to openLDAP to isolate this down so please see my added comment where I found a delta in the compile flags in OpenLDAP between version 2.4.59 and 2.5.xx. Adding the compile flags back in PREVENTS the lockup.
https://bugs.openldap.org/show_bug.cgi?id=10099
--- Comment #4 from Howard Chu hyc@openldap.org --- Yes, understood. But libldap by itself can't cause an entire machine to hang, so that had to be a problem specific to the process that was running. The fact that the behavior depends on the kernel version implies some other problems as well, since libldap doesn't do anything special to the kernel.
As for "hanging somewhere inside the library" please provide a complete stack trace of the offending process when it hangs.
I don't see anything special in the Makefile you attached. Please provide a diff showing the actual flag changes you needed to make.
Right now you've provided no useful details. If you don't provide all the requested information, there's not much we can do to investigate.
https://bugs.openldap.org/show_bug.cgi?id=10099
--- Comment #5 from AllenM amcwongahey@rbbn.com --- Created attachment 983 --> https://bugs.openldap.org/attachment.cgi?id=983&action=edit Diff file
Diff file which fixed the entire problem.
https://bugs.openldap.org/show_bug.cgi?id=10099
--- Comment #6 from Quanah Gibson-Mount quanah@openldap.org --- (In reply to AllenM from comment #5)
Created attachment 983 [details] Diff file
Diff file which fixed the entire problem.
Please provide the information requested in comment #4. Thank you.
https://bugs.openldap.org/show_bug.cgi?id=10099
--- Comment #7 from AllenM amcwongahey@rbbn.com --- The process which uses openLDAP does absolutely nothing special and does not even interact with the Kernel. When that openLDAP function is called it simply never returns back from the OpenLDAP API.
I am not able to get a kill -6 to generate a CORE file. Not able to get anything useful from strace either.
I provided the unified DIFF file which added back in the compile flags which were present prior to OpenLDAP 2.5 and that resolved the entire issue. No more hangs and it functionally works just fine now just like it always has before.
If I call an OpenLDAP API function and it never comes back to the calling process it is clear it is hanging inside OpenLDAP.
I added quite a bit of debug in OpenLDAP and I could even prove to myself from doing that that it was getting stuck.
https://bugs.openldap.org/show_bug.cgi?id=10099
Howard Chu hyc@openldap.org changed:
What |Removed |Added ---------------------------------------------------------------------------- Keywords|needs_review |