Hello,
I'm seeing some really weird behaviour when using ldaps:// on an openldap-2.3.43 server. It's a Gentoo Linux box with glibc-2.9_p20081201-r2 and openssl-0.9.8k. I have already recompiled the entire system with gcc-4.3.4 (twice to be sure), with no errors. First of all, ldapsearch -H ldaps://bussard.lih.rwth-aachen.de just hangs. The strange part: when I strace -f slapd, from the second retry on, it works.
So I went on by debugging with openssl s_client, which exhibits just the same behaviour. However it reveals that slapd falls silent in the middle of sending the certificates.
So if I do:
$ openssl s_client -connect bussard.lih.rwth-aachen.de:636 -state -status -CAfile /etc/openldap/ssl/rwth-dfn-tcom.crt CONNECTED(00000003) SSL_connect:before/connect initialization SSL_connect:SSLv2/v3 write client hello A OCSP response: no response sent SSL_connect:SSLv3 read server hello A depth=3 /C=DE/O=Deutsche Telekom AG/OU=T-TeleSec Trust Center/CN=Deutsche Telekom Root CA 2 verify return:1 depth=2 /C=DE/O=DFN-Verein/OU=DFN-PKI/CN=DFN-Verein PCA Global - G01 verify return:1 depth=1 /C=DE/O=RWTH Aachen/CN=RWTH Aachen CA/emailAddress=ca@rwth-aachen.de verify return:1 depth=0 /C=DE/O=RWTH Aachen/OU=Lehrstuhl fuer Ingenieur- und Hydrogeologie/CN=ldap.lih.rwth-aachen.de verify return:1 SSL_connect:SSLv3 read server certificate A ^C
Now after I've done "strace -f -p `pidof slapd`" on the server, I get the same as above once. Then when I try a second time:
$ openssl s_client -connect bussard.lih.rwth-aachen.de:636 -state -CAfile /etc/openldap/ssl/rwth-dfn-tcom.crt CONNECTED(00000003) SSL_connect:before/connect initialization SSL_connect:SSLv2/v3 write client hello A SSL_connect:SSLv3 read server hello A depth=3 /C=DE/O=Deutsche Telekom AG/OU=T-TeleSec Trust Center/CN=Deutsche Telekom Root CA 2 verify return:1 depth=2 /C=DE/O=DFN-Verein/OU=DFN-PKI/CN=DFN-Verein PCA Global - G01 verify return:1 depth=1 /C=DE/O=RWTH Aachen/CN=RWTH Aachen CA/emailAddress=ca@rwth-aachen.de verify return:1 depth=0 /C=DE/O=RWTH Aachen/OU=Lehrstuhl fuer Ingenieur- und Hydrogeologie/CN=ldap.lih.rwth-aachen.de verify return:1 SSL_connect:SSLv3 read server certificate A SSL_connect:SSLv3 read server certificate request A SSL_connect:SSLv3 read server done A SSL_connect:SSLv3 write client certificate A SSL_connect:SSLv3 write client key exchange A SSL_connect:SSLv3 write change cipher spec A SSL_connect:SSLv3 write finished A SSL_connect:SSLv3 flush data SSL_connect:SSLv3 read finished A --- Certificate chain 0 s:/C=DE/O=RWTH Aachen/OU=Lehrstuhl fuer Ingenieur- und Hydrogeologie/CN=ldap.lih.rwth-aachen.de i:/C=DE/O=RWTH Aachen/CN=RWTH Aachen CA/emailAddress=ca@rwth-aachen.de 1 s:/C=DE/O=RWTH Aachen/CN=RWTH Aachen CA/emailAddress=ca@rwth-aachen.de i:/C=DE/O=DFN-Verein/OU=DFN-PKI/CN=DFN-Verein PCA Global - G01 2 s:/C=DE/O=DFN-Verein/OU=DFN-PKI/CN=DFN-Verein PCA Global - G01 i:/C=DE/O=Deutsche Telekom AG/OU=T-TeleSec Trust Center/CN=Deutsche Telekom Root CA 2 3 s:/C=DE/O=Deutsche Telekom AG/OU=T-TeleSec Trust Center/CN=Deutsche Telekom Root CA 2 i:/C=DE/O=Deutsche Telekom AG/OU=T-TeleSec Trust Center/CN=Deutsche Telekom Root CA 2 --- Server certificate -----BEGIN CERTIFICATE-----
... (lotsa stuff) ...
/C=ES/ST=Barcelona/L=Barcelona/O=IPS Internet publishing Services s.l./O=ips@mail.ips.es C.I.F. B-60929452/OU=IPS CA Timestamping Certification Authority/CN=IPS CA Timestamping Certification Authority/emailAddress=ips@mail.ips.es --- SSL handshake has read 26305 bytes and written 480 bytes --- New, TLSv1/SSLv3, Cipher is AES256-SHA Server public key is 2048 bit Compression: NONE Expansion: NONE SSL-Session: Protocol : TLSv1 Cipher : AES256-SHA Session-ID: 99...CB400 Session-ID-ctx: Master-Key: 152...A2D Key-Arg : None Start Time: 1256667603 Timeout : 300 (sec) Verify return code: 0 (ok) ---
This time the SSL handshake works, just because I'm strace'ing slapd? This looks like some really weird race condition. It's driving me crazy. Should I talk to the openssl people about this? But when I make an openssl testbed with openssl s_server and s_client, everything works fine, so it shouldn't be an openssl issue.
Oh and maybe you'd like to see the strace output. This is what it looks like when the SSL client hangs:
# strace -f -p `pidof slapd` Process 3339 attached with 3 threads - interrupt to quit [pid 3339] futex(0x8977560, FUTEX_WAIT_PRIVATE, 1, NULL <unfinished ...> [pid 3338] time(NULL) = 1256668096 [pid 3338] epoll_wait(6, <unfinished ...> [pid 3328] futex(0xad15ebd8, FUTEX_WAIT, 3338, NULL <unfinished ...> [pid 3338] <... epoll_wait resumed> {{EPOLLIN, {u32=143663400, u64=143663400}}}, 1024, 1798000) = 1 [pid 3338] accept(7, {sa_family=AF_INET, sin_port=htons(37192), sin_addr=inet_addr("137.226.164.160")}, [16]) = 14 [pid 3338] setsockopt(14, SOL_SOCKET, SO_KEEPALIVE, [1], 4) = 0 [pid 3338] setsockopt(14, SOL_TCP, TCP_NODELAY, [1], 4) = 0 [pid 3338] open("/etc/hosts.allow", O_RDONLY) = -1 ENOENT (No such file or directory) [pid 3338] open("/etc/hosts.deny", O_RDONLY) = -1 ENOENT (No such file or directory) [pid 3338] time(NULL) = 1256668099 [pid 3338] fcntl64(14, F_GETFL) = 0x2 (flags O_RDWR) [pid 3338] fcntl64(14, F_SETFL, O_RDWR|O_NONBLOCK) = 0 [pid 3338] epoll_ctl(6, EPOLL_CTL_ADD, 14, {EPOLLIN, {u32=143700808, u64=143700808}}) = 0 [pid 3338] time(NULL) = 1256668099 [pid 3338] epoll_wait(6, {{EPOLLIN, {u32=143700808, u64=143700808}}}, 1024, 1795000) = 1 [pid 3338] time(NULL) = 1256668099 [pid 3338] time(NULL) = 1256668099 [pid 3338] read(14, "\200\214\1\3\1\0c\0\0\0 "..., 11) = 11 [pid 3338] read(14, "\0\0009\0\0008\0\0005\0\0\210\0\0\207\0\0\204\0\0\26\0\0\23\0\0\n\7\0\300\0\0003"..., 131) = 131 [pid 3338] time(NULL) = 1256668099 [pid 3338] time(NULL) = 1256668099 [pid 3338] time(NULL) = 1256668099 [pid 3338] write(14, "\26\3\1\0J\2\0\0F\3\1J\347;\303\310\247w\24<\206!\334\3345\304\327\321\344\36FG\37"..., 4096) = 4096 [pid 3338] write(14, ""0\r\6\t*\206H\206\367\r\1\1\1\5\0\3\202\1\17\0000\202\1\n\2\202\1\1\0\253\v\243"..., 4096) = 4096 [pid 3338] write(14, "ootCA1\0;091\v0\t\6\3U\4\6\23\2FI1\0170\r\6\3U\4\n\23"..., 12928) = 6288 [pid 3338] write(14, "go1,0*\6\3U\4\v\23#Wells Fargo Certific"..., 6640) = -1 EAGAIN (Resource temporarily unavailable) [pid 3338] write(14, "go1,0*\6\3U\4\v\23#Wells Fargo Certific"..., 6640) = 6640 [pid 3338] write(14, "\26\3\1\24\1ck Halozatbiztonsagi Kft.1\0320"..., 5126) = 2048 [pid 3338] write(14, "\4\n\23\37Software in the Public Intere"..., 3078) = -1 EAGAIN (Resource temporarily unavailable) [pid 3338] time(NULL) = 1256668099 [pid 3338] epoll_wait(6,
(Strg-C on the client...)
{{EPOLLIN, {u32=143700808, u64=143700808}}}, 1024, 1795000) = 1 [pid 3338] time(NULL) = 1256668102 [pid 3338] time(NULL) = 1256668102 [pid 3338] write(14, "\4\n\23\37Software in the Public Intere"..., 3078) = 3078 [pid 3338] read(14, ""..., 5) = 0 [pid 3338] epoll_ctl(6, EPOLL_CTL_MOD, 14, {0, {u32=143700808, u64=143700808}}) = 0 [pid 3338] write(5, "0"..., 1) = 1 [pid 3338] epoll_ctl(6, EPOLL_CTL_DEL, 14, {0, {u32=143700808, u64=143700808}}) = 0 [pid 3338] shutdown(14, 2 /* send and receive */) = -1 ENOTCONN (Transport endpoint is not connected) [pid 3338] close(14) = 0 [pid 3338] time(NULL) = 1256668102 [pid 3338] epoll_wait(6, {{EPOLLIN, {u32=143700768, u64=143700768}}}, 1024, 1792000) = 1 [pid 3338] read(4, "0"..., 8192) = 1 [pid 3338] time(NULL) = 1256668102 [pid 3338] epoll_wait(6,
and that's it. Now when I try for the second time (now I get the server cert alright), it looks like this:
# strace -f -p `pidof slapd` Process 3354 attached with 4 threads - interrupt to quit [pid 3339] futex(0x8977560, FUTEX_WAIT_PRIVATE, 42, NULL <unfinished ...> [pid 3354] futex(0x8977560, FUTEX_WAIT_PRIVATE, 42, NULL <unfinished ...> [pid 3338] time(NULL) = 1256668222 [pid 3338] epoll_wait(6, <unfinished ...> [pid 3328] futex(0xad15ebd8, FUTEX_WAIT, 3338, NULL <unfinished ...> [pid 3338] <... epoll_wait resumed> {{EPOLLIN, {u32=143663400, u64=143663400}}}, 1024, 1672000) = 1 [pid 3338] accept(7, {sa_family=AF_INET, sin_port=htons(37195), sin_addr=inet_addr("137.226.164.160")}, [16]) = 15 [pid 3338] setsockopt(15, SOL_SOCKET, SO_KEEPALIVE, [1], 4) = 0 [pid 3338] setsockopt(15, SOL_TCP, TCP_NODELAY, [1], 4) = 0 [pid 3338] open("/etc/hosts.allow", O_RDONLY) = -1 ENOENT (No such file or directory) [pid 3338] open("/etc/hosts.deny", O_RDONLY) = -1 ENOENT (No such file or directory) [pid 3338] time(NULL) = 1256668224 [pid 3338] fcntl64(15, F_GETFL) = 0x2 (flags O_RDWR) [pid 3338] fcntl64(15, F_SETFL, O_RDWR|O_NONBLOCK) = 0 [pid 3338] epoll_ctl(6, EPOLL_CTL_ADD, 15, {EPOLLIN, {u32=143700812, u64=143700812}}) = 0 [pid 3338] time(NULL) = 1256668224 [pid 3338] epoll_wait(6, {{EPOLLIN, {u32=143700812, u64=143700812}}}, 1024, 1670000) = 1 [pid 3338] time(NULL) = 1256668224 [pid 3338] time(NULL) = 1256668224 [pid 3338] read(15, "\200\214\1\3\1\0c\0\0\0 "..., 11) = 11 [pid 3338] read(15, "\0\0009\0\0008\0\0005\0\0\210\0\0\207\0\0\204\0\0\26\0\0\23\0\0\n\7\0\300\0\0003"..., 131) = 131 [pid 3338] time(NULL) = 1256668224 [pid 3338] time(NULL) = 1256668224 [pid 3338] time(NULL) = 1256668224 [pid 3338] write(15, "\26\3\1\0J\2\0\0F\3\1J\347<@5\352%\335\336\264Q\2263\346\303\335\t\2\34\241\372Q"..., 4096) = 4096 [pid 3338] write(15, ""0\r\6\t*\206H\206\367\r\1\1\1\5\0\3\202\1\17\0000\202\1\n\2\202\1\1\0\253\v\243"..., 4096) = 4096 [pid 3338] write(15, "ootCA1\0;091\v0\t\6\3U\4\6\23\2FI1\0170\r\6\3U\4\n\23"..., 12928) = 11584 [pid 3338] write(15, "\6\3U\4\6\23\2AU1\0230\21\6\3U\4\10\23\nQueensland1\0210"..., 4096) = 2896 [pid 3338] write(15, "ty1$0"\6\3U\4\n\23\33Digital Signature Tr"..., 1200) = 1200 [pid 3338] write(15, "\26\31personal-basic@thawte.com\0\3210\201\3161"..., 2374) = 2374 [pid 3338] read(15, 0x8a08398, 5) = -1 EAGAIN (Resource temporarily unavailable) [pid 3338] time(NULL) = 1256668224 [pid 3338] epoll_wait(6, {{EPOLLIN, {u32=143700812, u64=143700812}}}, 1024, 1670000) = 1 [pid 3338] time(NULL) = 1256668224 [pid 3338] time(NULL) = 1256668224 [pid 3338] read(15, "\26\3\1\0\7"..., 5) = 5 [pid 3338] read(15, "\v\0\0\3\0\0\0"..., 7) = 7 [pid 3338] read(15, "\26\3\1\1\6"..., 5) = 5 [pid 3338] read(15, "\20\0\1\2\1\0~\246\237\364\202\0\217\345#|\241\273k\34\251\277\224X\346\274\361\300\373\1\24\226\334"..., 262) = 262 [pid 3338] read(15, "\24\3\1\0\1"..., 5) = 5 [pid 3338] read(15, "\1"..., 1) = 1 [pid 3338] read(15, "\26\3\1\0000"..., 5) = 5 [pid 3338] read(15, "\36\337\371\314\260\5\246\233\17\31^P\3027\227\333\257\374\221F\\20?1\316\207\201BJQ\337\264\224"..., 48) = 48 [pid 3338] write(15, "\24\3\1\0\1\1\26\3\1\0000\356\336\3673\3034w\344\3364e\264\10dP\302\205\3058\357\272c"..., 59) = 59 [pid 3338] time(NULL) = 1256668224 [pid 3338] epoll_wait(6,
Hope that someone can make sense of this. Just to be clear: ldapsearch behaves the same way as described above for openssl s_client.
Thank you very much for even reading so far.
--On Thursday, October 29, 2009 2:56 PM +0100 Victor Mataré matare@lih.rwth-aachen.de wrote:
Hope that someone can make sense of this. Just to be clear: ldapsearch behaves the same way as described above for openssl s_client.
Thank you very much for even reading so far.
If slapd is the one failing to send data, why don't you turn up the debugging level on the slapd side and see what it thinks is happening? I.e., start slapd by hand with something like -d 2 or -d -1 and see what it reports at the time at which the connection hangs.
--Quanah
--
Quanah Gibson-Mount Principal Software Engineer Zimbra, Inc -------------------- Zimbra :: the leader in open source messaging and collaboration
Quanah Gibson-Mount wrote:
--On Thursday, October 29, 2009 2:56 PM +0100 Victor Mataré matare@lih.rwth-aachen.de wrote:
Hope that someone can make sense of this. Just to be clear: ldapsearch behaves the same way as described above for openssl s_client.
Thank you very much for even reading so far.
If slapd is the one failing to send data, why don't you turn up the debugging level on the slapd side and see what it thinks is happening? I.e., start slapd by hand with something like -d 2 or -d -1 and see what it reports at the time at which the connection hangs.
--Quanah
Ok, when I start slapd with -d 9, I see this:
slap_listener(ldap://)
connection_get(15): got connid=1 connection_read(15): checking for input on id=1 ber_get_next ber_get_next: tag 0x30 len 29 contents: ber_get_next do_extended ber_scanf fmt ({m) ber: send_ldap_extended: err=0 oid= len=0 send_ldap_response: msgid=1 tag=120 err=0 ber_flush: 14 bytes to sd 15 connection_get(15): got connid=1 connection_read(15): checking for input on id=1 TLS trace: SSL_accept:before/accept initialization TLS trace: SSL_accept:SSLv3 read client hello A TLS trace: SSL_accept:SSLv3 write server hello A TLS trace: SSL_accept:SSLv3 write certificate A TLS trace: SSL_accept:error in SSLv3 write certificate request B TLS trace: SSL_accept:error in SSLv3 write certificate request B
(Strg-C on the client)
connection_get(15): got connid=1 connection_read(15): checking for input on id=1 TLS trace: SSL_accept:SSLv3 write certificate request B TLS trace: SSL_accept:SSLv3 flush data TLS trace: SSL_accept:failed in SSLv3 read client certificate A TLS: can't accept. connection_read(15): TLS accept failure error=-1 id=1, closing connection_closing: readying conn=1 sd=15 for close connection_close: conn=1 sd=15
However it looks like it might be a client issue after all, because I found out some clients can actually talk to the server through ldaps:// or STARTTLS, while others fail with "Can't contact ldap server". This is some weird breakage. Don't bother too much with this, I think I have to do some more experimentation. But thanks to all for the quick responses so far.
--On Friday, October 30, 2009 1:30 AM +0100 Victor Mataré matare@lih.rwth-aachen.de wrote:
However it looks like it might be a client issue after all, because I found out some clients can actually talk to the server through ldaps:// or STARTTLS, while others fail with "Can't contact ldap server". This is some weird breakage. Don't bother too much with this, I think I have to do some more experimentation. But thanks to all for the quick responses so far.
GnuTLS vs OpenSSL linked libraries?
--Quanah
--
Quanah Gibson-Mount Principal Software Engineer Zimbra, Inc -------------------- Zimbra :: the leader in open source messaging and collaboration
Victor Mataré matare@lih.rwth-aachen.de writes:
Hello,
I'm seeing some really weird behaviour when using ldaps:// on an openldap-2.3.43 server. It's a Gentoo Linux box with glibc-2.9_p20081201-r2 and openssl-0.9.8k. I have already recompiled the entire system with gcc-4.3.4 (twice to be sure), with no errors. First of all, ldapsearch -H ldaps://bussard.lih.rwth-aachen.de
Hydrogeologie/CN=ldap.lih.rwth-aachen.de
Hydrogeologie/CN=ldap.lih.rwth-aachen.de
The FQDN of the certificate is ldap.lih.rwth-aachen.de, but your search URI is bussard.lih.rwth-aachen.de
-Dieter
Dieter Kluenter wrote:
Victor Mataré matare@lih.rwth-aachen.de writes:
Hello,
I'm seeing some really weird behaviour when using ldaps:// on an openldap-2.3.43 server. It's a Gentoo Linux box with glibc-2.9_p20081201-r2 and openssl-0.9.8k. I have already recompiled the entire system with gcc-4.3.4 (twice to be sure), with no errors. First of all, ldapsearch -H ldaps://bussard.lih.rwth-aachen.de
Hydrogeologie/CN=ldap.lih.rwth-aachen.de
Hydrogeologie/CN=ldap.lih.rwth-aachen.de
The FQDN of the certificate is ldap.lih.rwth-aachen.de, but your search URI is bussard.lih.rwth-aachen.de
-Dieter
Yep, that's alright. The certificate contains multiple alternative CNs, one of which is bussard.lih.rwth-aachen.de. They're just not shown here, but the cert is definitely valid for that hostname, so that's not the cause of the problem. And even if it was, slapd shouldn't just hang. But thanks for looking carefully.
Victor Mataré matare@lih.rwth-aachen.de writes:
Dieter Kluenter wrote:
Victor Mataré matare@lih.rwth-aachen.de writes:
[...]
The FQDN of the certificate is ldap.lih.rwth-aachen.de, but your search URI is bussard.lih.rwth-aachen.de
-Dieter
Yep, that's alright. The certificate contains multiple alternative CNs, one of which is bussard.lih.rwth-aachen.de. They're just not shown here, but the cert is definitely valid for that hostname, so that's not the cause of the problem. And even if it was, slapd shouldn't just hang. But thanks for looking carefully.
GnuTLS cannot handle the subjectAltName attribute, thus if eihter client and/or server are linked with libgnutls it will cause such problem.
-Dieter
Dieter Kluenter wrote:
Victor Mataré matare@lih.rwth-aachen.de writes:
Dieter Kluenter wrote:
Victor Mataré matare@lih.rwth-aachen.de writes:
[...]
The FQDN of the certificate is ldap.lih.rwth-aachen.de, but your search URI is bussard.lih.rwth-aachen.de
-Dieter
Yep, that's alright. The certificate contains multiple alternative CNs, one of which is bussard.lih.rwth-aachen.de. They're just not shown here, but the cert is definitely valid for that hostname, so that's not the cause of the problem. And even if it was, slapd shouldn't just hang. But thanks for looking carefully.
GnuTLS cannot handle the subjectAltName attribute, thus if eihter client and/or server are linked with libgnutls it will cause such problem.
False.
Howard Chu hyc@symas.com writes:
Dieter Kluenter wrote:
Victor Mataré matare@lih.rwth-aachen.de writes:
Dieter Kluenter wrote:
Victor Mataré matare@lih.rwth-aachen.de writes:
[...]
The FQDN of the certificate is ldap.lih.rwth-aachen.de, but your search URI is bussard.lih.rwth-aachen.de
-Dieter
Yep, that's alright. The certificate contains multiple alternative CNs, one of which is bussard.lih.rwth-aachen.de. They're just not shown here, but the cert is definitely valid for that hostname, so that's not the cause of the problem. And even if it was, slapd shouldn't just hang. But thanks for looking carefully.
GnuTLS cannot handle the subjectAltName attribute, thus if eihter client and/or server are linked with libgnutls it will cause such problem.
False.
OK, https://savannah.gnu.org/support/index.php?106975 has been fixed.
-Dieter
Dieter Kluenter wrote:
Howard Chu hyc@symas.com writes:
Dieter Kluenter wrote:
GnuTLS cannot handle the subjectAltName attribute, thus if eihter client and/or server are linked with libgnutls it will cause such problem.
False.
OK, https://savannah.gnu.org/support/index.php?106975 has been fixed.
Note that this bug only affected certificates that contained XMPP subjectAltNames. Since XMPP names are relatively new, most certs aren't affected by this bug.
Why don´t you try ldapsearch -H ldaps://ldap.lih.rwth-aachen.de as Dieter suggest you? I´m not an expert in OpenLdap, but I´ve using it for some years, and some months ago, working with GnuTLS and SSL, I couldn´t contact because in the server certificate the CN was "ldap.server", and I was trying to connect trought ldapsearch -H ldaps://server http://ldap.lih.rwth-aachen.de/ Both of the names were of the same computer, but SSL gave me an error saying me the CN server was "ldap.server", and I was trying to contact with "server".
2009/10/30 Howard Chu hyc@symas.com
Dieter Kluenter wrote:
Howard Chu hyc@symas.com writes:
Dieter Kluenter wrote:
GnuTLS cannot handle the subjectAltName attribute, thus if eihter client and/or server are linked with libgnutls it will cause such problem.
False.
OK, https://savannah.gnu.org/support/index.php?106975 has been fixed.
Note that this bug only affected certificates that contained XMPP subjectAltNames. Since XMPP names are relatively new, most certs aren't affected by this bug.
-- -- Howard Chu CTO, Symas Corp. http://www.symas.com Director, Highland Sun http://highlandsun.com/hyc/ Chief Architect, OpenLDAP http://www.openldap.org/project/
On Thu, Oct 29, 2009 at 9:56 AM, Victor Mataré matare@lih.rwth-aachen.de wrote:
Hello,
I'm seeing some really weird behaviour when using ldaps:// on an openldap-2.3.43 server. It's a Gentoo Linux box with glibc-2.9_p20081201-r2 and openssl-0.9.8k. I have already recompiled the entire system with gcc-4.3.4 (twice to be sure), with no errors. First of all, ldapsearch -H ldaps://bussard.lih.rwth-aachen.de just hangs. The strange part: when I strace -f slapd, from the second retry on, it works.
So I went on by debugging with openssl s_client, which exhibits just the same behaviour. However it reveals that slapd falls silent in the middle of sending the certificates.
So if I do:
$ openssl s_client -connect bussard.lih.rwth-aachen.de:636 -state -status -CAfile /etc/openldap/ssl/rwth-dfn-tcom.crt CONNECTED(00000003) SSL_connect:before/connect initialization SSL_connect:SSLv2/v3 write client hello A OCSP response: no response sent SSL_connect:SSLv3 read server hello A depth=3 /C=DE/O=Deutsche Telekom AG/OU=T-TeleSec Trust Center/CN=Deutsche Telekom Root CA 2 verify return:1 depth=2 /C=DE/O=DFN-Verein/OU=DFN-PKI/CN=DFN-Verein PCA Global - G01 verify return:1 depth=1 /C=DE/O=RWTH Aachen/CN=RWTH Aachen CA/emailAddress=ca@rwth-aachen.de verify return:1 depth=0 /C=DE/O=RWTH Aachen/OU=Lehrstuhl fuer Ingenieur- und Hydrogeologie/CN=ldap.lih.rwth-aachen.de verify return:1 SSL_connect:SSLv3 read server certificate A ^C
Now after I've done "strace -f -p `pidof slapd`" on the server, I get the same as above once. Then when I try a second time:
$ openssl s_client -connect bussard.lih.rwth-aachen.de:636 -state -CAfile /etc/openldap/ssl/rwth-dfn-tcom.crt CONNECTED(00000003) SSL_connect:before/connect initialization SSL_connect:SSLv2/v3 write client hello A SSL_connect:SSLv3 read server hello A depth=3 /C=DE/O=Deutsche Telekom AG/OU=T-TeleSec Trust Center/CN=Deutsche Telekom Root CA 2 verify return:1 depth=2 /C=DE/O=DFN-Verein/OU=DFN-PKI/CN=DFN-Verein PCA Global - G01 verify return:1 depth=1 /C=DE/O=RWTH Aachen/CN=RWTH Aachen CA/emailAddress=ca@rwth-aachen.de verify return:1 depth=0 /C=DE/O=RWTH Aachen/OU=Lehrstuhl fuer Ingenieur- und Hydrogeologie/CN=ldap.lih.rwth-aachen.de verify return:1 SSL_connect:SSLv3 read server certificate A SSL_connect:SSLv3 read server certificate request A SSL_connect:SSLv3 read server done A SSL_connect:SSLv3 write client certificate A SSL_connect:SSLv3 write client key exchange A SSL_connect:SSLv3 write change cipher spec A SSL_connect:SSLv3 write finished A SSL_connect:SSLv3 flush data SSL_connect:SSLv3 read finished A
Certificate chain 0 s:/C=DE/O=RWTH Aachen/OU=Lehrstuhl fuer Ingenieur- und Hydrogeologie/CN=ldap.lih.rwth-aachen.de i:/C=DE/O=RWTH Aachen/CN=RWTH Aachen CA/emailAddress=ca@rwth-aachen.de 1 s:/C=DE/O=RWTH Aachen/CN=RWTH Aachen CA/emailAddress=ca@rwth-aachen.de i:/C=DE/O=DFN-Verein/OU=DFN-PKI/CN=DFN-Verein PCA Global - G01 2 s:/C=DE/O=DFN-Verein/OU=DFN-PKI/CN=DFN-Verein PCA Global - G01 i:/C=DE/O=Deutsche Telekom AG/OU=T-TeleSec Trust Center/CN=Deutsche Telekom Root CA 2 3 s:/C=DE/O=Deutsche Telekom AG/OU=T-TeleSec Trust Center/CN=Deutsche Telekom Root CA 2 i:/C=DE/O=Deutsche Telekom AG/OU=T-TeleSec Trust Center/CN=Deutsche Telekom Root CA 2
Server certificate -----BEGIN CERTIFICATE-----
... (lotsa stuff) ...
/C=ES/ST=Barcelona/L=Barcelona/O=IPS Internet publishing Services s.l./O=ips@mail.ips.es C.I.F. B-60929452/OU=IPS CA Timestamping Certification Authority/CN=IPS CA Timestamping Certification Authority/emailAddress=ips@mail.ips.es
SSL handshake has read 26305 bytes and written 480 bytes
New, TLSv1/SSLv3, Cipher is AES256-SHA Server public key is 2048 bit Compression: NONE Expansion: NONE SSL-Session: Protocol : TLSv1 Cipher : AES256-SHA Session-ID: 99...CB400 Session-ID-ctx: Master-Key: 152...A2D Key-Arg : None Start Time: 1256667603 Timeout : 300 (sec) Verify return code: 0 (ok)
This time the SSL handshake works, just because I'm strace'ing slapd? This looks like some really weird race condition. It's driving me crazy. Should I talk to the openssl people about this? But when I make an openssl testbed with openssl s_server and s_client, everything works fine, so it shouldn't be an openssl issue.
Oh and maybe you'd like to see the strace output. This is what it looks like when the SSL client hangs:
# strace -f -p `pidof slapd` Process 3339 attached with 3 threads - interrupt to quit [pid 3339] futex(0x8977560, FUTEX_WAIT_PRIVATE, 1, NULL <unfinished ...> [pid 3338] time(NULL) = 1256668096 [pid 3338] epoll_wait(6, <unfinished ...> [pid 3328] futex(0xad15ebd8, FUTEX_WAIT, 3338, NULL <unfinished ...> [pid 3338] <... epoll_wait resumed> {{EPOLLIN, {u32=143663400, u64=143663400}}}, 1024, 1798000) = 1 [pid 3338] accept(7, {sa_family=AF_INET, sin_port=htons(37192), sin_addr=inet_addr("137.226.164.160")}, [16]) = 14 [pid 3338] setsockopt(14, SOL_SOCKET, SO_KEEPALIVE, [1], 4) = 0 [pid 3338] setsockopt(14, SOL_TCP, TCP_NODELAY, [1], 4) = 0 [pid 3338] open("/etc/hosts.allow", O_RDONLY) = -1 ENOENT (No such file or directory) [pid 3338] open("/etc/hosts.deny", O_RDONLY) = -1 ENOENT (No such file or directory) [pid 3338] time(NULL) = 1256668099 [pid 3338] fcntl64(14, F_GETFL) = 0x2 (flags O_RDWR) [pid 3338] fcntl64(14, F_SETFL, O_RDWR|O_NONBLOCK) = 0 [pid 3338] epoll_ctl(6, EPOLL_CTL_ADD, 14, {EPOLLIN, {u32=143700808, u64=143700808}}) = 0 [pid 3338] time(NULL) = 1256668099 [pid 3338] epoll_wait(6, {{EPOLLIN, {u32=143700808, u64=143700808}}}, 1024, 1795000) = 1 [pid 3338] time(NULL) = 1256668099 [pid 3338] time(NULL) = 1256668099 [pid 3338] read(14, "\200\214\1\3\1\0c\0\0\0 "..., 11) = 11 [pid 3338] read(14, "\0\0009\0\0008\0\0005\0\0\210\0\0\207\0\0\204\0\0\26\0\0\23\0\0\n\7\0\300\0\0003"..., 131) = 131 [pid 3338] time(NULL) = 1256668099 [pid 3338] time(NULL) = 1256668099 [pid 3338] time(NULL) = 1256668099 [pid 3338] write(14, "\26\3\1\0J\2\0\0F\3\1J\347;\303\310\247w\24<\206!\334\3345\304\327\321\344\36FG\37"..., 4096) = 4096 [pid 3338] write(14, ""0\r\6\t*\206H\206\367\r\1\1\1\5\0\3\202\1\17\0000\202\1\n\2\202\1\1\0\253\v\243"..., 4096) = 4096 [pid 3338] write(14, "ootCA1\0;091\v0\t\6\3U\4\6\23\2FI1\0170\r\6\3U\4\n\23"..., 12928) = 6288 [pid 3338] write(14, "go1,0*\6\3U\4\v\23#Wells Fargo Certific"..., 6640) = -1 EAGAIN (Resource temporarily unavailable) [pid 3338] write(14, "go1,0*\6\3U\4\v\23#Wells Fargo Certific"..., 6640) = 6640 [pid 3338] write(14, "\26\3\1\24\1ck Halozatbiztonsagi Kft.1\0320"..., 5126) = 2048 [pid 3338] write(14, "\4\n\23\37Software in the Public Intere"..., 3078) = -1 EAGAIN (Resource temporarily unavailable) [pid 3338] time(NULL) = 1256668099 [pid 3338] epoll_wait(6,
(Strg-C on the client...)
{{EPOLLIN, {u32=143700808, u64=143700808}}}, 1024, 1795000) = 1 [pid 3338] time(NULL) = 1256668102 [pid 3338] time(NULL) = 1256668102 [pid 3338] write(14, "\4\n\23\37Software in the Public Intere"..., 3078) = 3078 [pid 3338] read(14, ""..., 5) = 0 [pid 3338] epoll_ctl(6, EPOLL_CTL_MOD, 14, {0, {u32=143700808, u64=143700808}}) = 0 [pid 3338] write(5, "0"..., 1) = 1 [pid 3338] epoll_ctl(6, EPOLL_CTL_DEL, 14, {0, {u32=143700808, u64=143700808}}) = 0 [pid 3338] shutdown(14, 2 /* send and receive */) = -1 ENOTCONN (Transport endpoint is not connected) [pid 3338] close(14) = 0 [pid 3338] time(NULL) = 1256668102 [pid 3338] epoll_wait(6, {{EPOLLIN, {u32=143700768, u64=143700768}}}, 1024, 1792000) = 1 [pid 3338] read(4, "0"..., 8192) = 1 [pid 3338] time(NULL) = 1256668102 [pid 3338] epoll_wait(6,
and that's it. Now when I try for the second time (now I get the server cert alright), it looks like this:
# strace -f -p `pidof slapd` Process 3354 attached with 4 threads - interrupt to quit [pid 3339] futex(0x8977560, FUTEX_WAIT_PRIVATE, 42, NULL <unfinished ...> [pid 3354] futex(0x8977560, FUTEX_WAIT_PRIVATE, 42, NULL <unfinished ...> [pid 3338] time(NULL) = 1256668222 [pid 3338] epoll_wait(6, <unfinished ...> [pid 3328] futex(0xad15ebd8, FUTEX_WAIT, 3338, NULL <unfinished ...> [pid 3338] <... epoll_wait resumed> {{EPOLLIN, {u32=143663400, u64=143663400}}}, 1024, 1672000) = 1 [pid 3338] accept(7, {sa_family=AF_INET, sin_port=htons(37195), sin_addr=inet_addr("137.226.164.160")}, [16]) = 15 [pid 3338] setsockopt(15, SOL_SOCKET, SO_KEEPALIVE, [1], 4) = 0 [pid 3338] setsockopt(15, SOL_TCP, TCP_NODELAY, [1], 4) = 0 [pid 3338] open("/etc/hosts.allow", O_RDONLY) = -1 ENOENT (No such file or directory) [pid 3338] open("/etc/hosts.deny", O_RDONLY) = -1 ENOENT (No such file or directory) [pid 3338] time(NULL) = 1256668224 [pid 3338] fcntl64(15, F_GETFL) = 0x2 (flags O_RDWR) [pid 3338] fcntl64(15, F_SETFL, O_RDWR|O_NONBLOCK) = 0 [pid 3338] epoll_ctl(6, EPOLL_CTL_ADD, 15, {EPOLLIN, {u32=143700812, u64=143700812}}) = 0 [pid 3338] time(NULL) = 1256668224 [pid 3338] epoll_wait(6, {{EPOLLIN, {u32=143700812, u64=143700812}}}, 1024, 1670000) = 1 [pid 3338] time(NULL) = 1256668224 [pid 3338] time(NULL) = 1256668224 [pid 3338] read(15, "\200\214\1\3\1\0c\0\0\0 "..., 11) = 11 [pid 3338] read(15, "\0\0009\0\0008\0\0005\0\0\210\0\0\207\0\0\204\0\0\26\0\0\23\0\0\n\7\0\300\0\0003"..., 131) = 131 [pid 3338] time(NULL) = 1256668224 [pid 3338] time(NULL) = 1256668224 [pid 3338] time(NULL) = 1256668224 [pid 3338] write(15, "\26\3\1\0J\2\0\0F\3\1J\347<@5\352%\335\336\264Q\2263\346\303\335\t\2\34\241\372Q"..., 4096) = 4096 [pid 3338] write(15, ""0\r\6\t*\206H\206\367\r\1\1\1\5\0\3\202\1\17\0000\202\1\n\2\202\1\1\0\253\v\243"..., 4096) = 4096 [pid 3338] write(15, "ootCA1\0;091\v0\t\6\3U\4\6\23\2FI1\0170\r\6\3U\4\n\23"..., 12928) = 11584 [pid 3338] write(15, "\6\3U\4\6\23\2AU1\0230\21\6\3U\4\10\23\nQueensland1\0210"..., 4096) = 2896 [pid 3338] write(15, "ty1$0"\6\3U\4\n\23\33Digital Signature Tr"..., 1200) = 1200 [pid 3338] write(15, "\26\31personal-basic@thawte.com\0\3210\201\3161"..., 2374) = 2374 [pid 3338] read(15, 0x8a08398, 5) = -1 EAGAIN (Resource temporarily unavailable) [pid 3338] time(NULL) = 1256668224 [pid 3338] epoll_wait(6, {{EPOLLIN, {u32=143700812, u64=143700812}}}, 1024, 1670000) = 1 [pid 3338] time(NULL) = 1256668224 [pid 3338] time(NULL) = 1256668224 [pid 3338] read(15, "\26\3\1\0\7"..., 5) = 5 [pid 3338] read(15, "\v\0\0\3\0\0\0"..., 7) = 7 [pid 3338] read(15, "\26\3\1\1\6"..., 5) = 5 [pid 3338] read(15, "\20\0\1\2\1\0~\246\237\364\202\0\217\345#|\241\273k\34\251\277\224X\346\274\361\300\373\1\24\226\334"..., 262) = 262 [pid 3338] read(15, "\24\3\1\0\1"..., 5) = 5 [pid 3338] read(15, "\1"..., 1) = 1 [pid 3338] read(15, "\26\3\1\0000"..., 5) = 5 [pid 3338] read(15, "\36\337\371\314\260\5\246\233\17\31^P\3027\227\333\257\374\221F\\20?1\316\207\201BJQ\337\264\224"..., 48) = 48 [pid 3338] write(15, "\24\3\1\0\1\1\26\3\1\0000\356\336\3673\3034w\344\3364e\264\10dP\302\205\3058\357\272c"..., 59) = 59 [pid 3338] time(NULL) = 1256668224 [pid 3338] epoll_wait(6,
Hope that someone can make sense of this. Just to be clear: ldapsearch behaves the same way as described above for openssl s_client.
Thank you very much for even reading so far.
I am sure components of openssh-lpk only want to use TLS over ldap. I read a PDF where they suggest ldaps is 'deprecated'. I have found that each component sudo, openssh-ldp, nsswitch seems to parse /etc/ldap.conf a different way. Some like ldaps, some don't. some like uri, some don't. I see it getting better but its not clean to the point everything works every which way.
Everyone(sudo,openssh-lpk) just has their own scheme for parsing ldap.conf and the things they accept.
My files look like this to deal with these issues. --------------------------------------------------------- # @(#)$Id: ldap.conf,v 1.38 2006/05/15 08:13:31 luk # # will fail over in software but not load balance host ldapslavelb.ops.ec.com host ldapslave1.ops.ec.com host ldapslave2.ops.ec.com uri ldap://ldapslavelb.ops.ec.com uri ldap://ldapslave1.ops.ec.com uri ldap://ldapslave2.ops.ec.com ssl start_tls port 389
timeout 5 bind_timeout 5 bind_policy soft
----------------------------------------------------------
I put my load balancer first, host ldapslavelb.ops.ec.com and individual hosts second, host ldapslave1.ops.ec.com
I include host and uri, even though its redundant.
So does anyone know the truth about this? Is ldaps considered deprecated in some circles?
On Thu, 29 Oct 2009, Edward Capriolo wrote:
So does anyone know the truth about this? Is ldaps considered deprecated in some circles?
As a rule of thumb, implicit-SSL protocols are not IETF Standards Track, and their StartTLS brethren are. In the case of LDAP, http://www.openldap.org/faq/data/cache/605.html is on point.
openldap-software@openldap.org