Hi!
Thanks for the replies!
Howard Chu wrote:
On the backend, this appears to be the case that a new connection kind of overruns the old one; see what happens with connection 6983:
Looks odd indeed, but doesn't seem related to the other errors. Would need much finer resolution timestamps to correlate what's going on, unless you know there are no other active connections on the proxy when this occurred on the backend.
Yeah. I don't know for certain, but those are the only connections that would match the dates and the results.
These aren't spurious - your TLS library has genuinely failed to start a session. Which TLS library are you using? What OS are you running on? The most common cause for periodic failures is running out of entropy for the PRNG.
RHEL 7, and slapd seems to be linked to the Mozilla nss libraries.
I called them "spurious" because there doesn't seem to be any correlation between the errors and any external events. But I have to admit I do not understand what kind of activity might cause entropy to be low; I somehow thought it would be a simple case of "more entropy used than the pool has" and then it would be simply caused by too much activity. But these errors seem to come sometimes when the server is more loaded and sometimes when it is less loaded and I haven't found any way to make them more probable. Or less.
Anything I can do about this? I mean, if the connection from the proxy to the backend fails because of Start TLS issues, couldn't the proxy just wait a while and try again once some entropy becomes available? Currently, the problems gets propagated back to the client of the OpenLDAP service, which then has to retry a failed connection.
Quanah Gibson-Mount wrote:
But we seem to be getting spurious Start TLS failed messages also without any competing connections. Here's one using ldap+STARTTLS but no other ACCEPTs anywhere near:
These aren't spurious - your TLS library has genuinely failed to start a session. Which TLS library are you using? What OS are you running on? The most common cause for periodic failures is running out of entropy for the PRNG.
They noted RHEL7 and 2.4.40, which would mean MozNSS, as the most recent RHEL7 build of 2.4.44 switched back to OpenSSL. I would just add this to the many reasons not to use RHEL for OpenLDAP.
The fact that they keep switching the TLS libraries they're linking to? I can roll out my own RPMs and keep them linked to the very same library all the time, but do you think linking to OpenSSL could help resolve my issue? Running out of entropy with only a few starttls calls per second, or only a few ldaps connections per second, seems to be a bit weird to me.
Anyway, thanks again.
--Janne / Helsinki Uni