Hello,
I'm seeing some really weird behaviour when using ldaps:// on an openldap-2.3.43 server. It's a Gentoo Linux box with glibc-2.9_p20081201-r2 and openssl-0.9.8k. I have already recompiled the entire system with gcc-4.3.4 (twice to be sure), with no errors. First of all, ldapsearch -H ldaps://bussard.lih.rwth-aachen.de just hangs. The strange part: when I strace -f slapd, from the second retry on, it works.
So I went on by debugging with openssl s_client, which exhibits just the same behaviour. However it reveals that slapd falls silent in the middle of sending the certificates.
So if I do:
$ openssl s_client -connect bussard.lih.rwth-aachen.de:636 -state -status -CAfile /etc/openldap/ssl/rwth-dfn-tcom.crt CONNECTED(00000003) SSL_connect:before/connect initialization SSL_connect:SSLv2/v3 write client hello A OCSP response: no response sent SSL_connect:SSLv3 read server hello A depth=3 /C=DE/O=Deutsche Telekom AG/OU=T-TeleSec Trust Center/CN=Deutsche Telekom Root CA 2 verify return:1 depth=2 /C=DE/O=DFN-Verein/OU=DFN-PKI/CN=DFN-Verein PCA Global - G01 verify return:1 depth=1 /C=DE/O=RWTH Aachen/CN=RWTH Aachen CA/emailAddress=ca@rwth-aachen.de verify return:1 depth=0 /C=DE/O=RWTH Aachen/OU=Lehrstuhl fuer Ingenieur- und Hydrogeologie/CN=ldap.lih.rwth-aachen.de verify return:1 SSL_connect:SSLv3 read server certificate A ^C
Now after I've done "strace -f -p `pidof slapd`" on the server, I get the same as above once. Then when I try a second time:
$ openssl s_client -connect bussard.lih.rwth-aachen.de:636 -state -CAfile /etc/openldap/ssl/rwth-dfn-tcom.crt CONNECTED(00000003) SSL_connect:before/connect initialization SSL_connect:SSLv2/v3 write client hello A SSL_connect:SSLv3 read server hello A depth=3 /C=DE/O=Deutsche Telekom AG/OU=T-TeleSec Trust Center/CN=Deutsche Telekom Root CA 2 verify return:1 depth=2 /C=DE/O=DFN-Verein/OU=DFN-PKI/CN=DFN-Verein PCA Global - G01 verify return:1 depth=1 /C=DE/O=RWTH Aachen/CN=RWTH Aachen CA/emailAddress=ca@rwth-aachen.de verify return:1 depth=0 /C=DE/O=RWTH Aachen/OU=Lehrstuhl fuer Ingenieur- und Hydrogeologie/CN=ldap.lih.rwth-aachen.de verify return:1 SSL_connect:SSLv3 read server certificate A SSL_connect:SSLv3 read server certificate request A SSL_connect:SSLv3 read server done A SSL_connect:SSLv3 write client certificate A SSL_connect:SSLv3 write client key exchange A SSL_connect:SSLv3 write change cipher spec A SSL_connect:SSLv3 write finished A SSL_connect:SSLv3 flush data SSL_connect:SSLv3 read finished A --- Certificate chain 0 s:/C=DE/O=RWTH Aachen/OU=Lehrstuhl fuer Ingenieur- und Hydrogeologie/CN=ldap.lih.rwth-aachen.de i:/C=DE/O=RWTH Aachen/CN=RWTH Aachen CA/emailAddress=ca@rwth-aachen.de 1 s:/C=DE/O=RWTH Aachen/CN=RWTH Aachen CA/emailAddress=ca@rwth-aachen.de i:/C=DE/O=DFN-Verein/OU=DFN-PKI/CN=DFN-Verein PCA Global - G01 2 s:/C=DE/O=DFN-Verein/OU=DFN-PKI/CN=DFN-Verein PCA Global - G01 i:/C=DE/O=Deutsche Telekom AG/OU=T-TeleSec Trust Center/CN=Deutsche Telekom Root CA 2 3 s:/C=DE/O=Deutsche Telekom AG/OU=T-TeleSec Trust Center/CN=Deutsche Telekom Root CA 2 i:/C=DE/O=Deutsche Telekom AG/OU=T-TeleSec Trust Center/CN=Deutsche Telekom Root CA 2 --- Server certificate -----BEGIN CERTIFICATE-----
... (lotsa stuff) ...
/C=ES/ST=Barcelona/L=Barcelona/O=IPS Internet publishing Services s.l./O=ips@mail.ips.es C.I.F. B-60929452/OU=IPS CA Timestamping Certification Authority/CN=IPS CA Timestamping Certification Authority/emailAddress=ips@mail.ips.es --- SSL handshake has read 26305 bytes and written 480 bytes --- New, TLSv1/SSLv3, Cipher is AES256-SHA Server public key is 2048 bit Compression: NONE Expansion: NONE SSL-Session: Protocol : TLSv1 Cipher : AES256-SHA Session-ID: 99...CB400 Session-ID-ctx: Master-Key: 152...A2D Key-Arg : None Start Time: 1256667603 Timeout : 300 (sec) Verify return code: 0 (ok) ---
This time the SSL handshake works, just because I'm strace'ing slapd? This looks like some really weird race condition. It's driving me crazy. Should I talk to the openssl people about this? But when I make an openssl testbed with openssl s_server and s_client, everything works fine, so it shouldn't be an openssl issue.
Oh and maybe you'd like to see the strace output. This is what it looks like when the SSL client hangs:
# strace -f -p `pidof slapd` Process 3339 attached with 3 threads - interrupt to quit [pid 3339] futex(0x8977560, FUTEX_WAIT_PRIVATE, 1, NULL <unfinished ...> [pid 3338] time(NULL) = 1256668096 [pid 3338] epoll_wait(6, <unfinished ...> [pid 3328] futex(0xad15ebd8, FUTEX_WAIT, 3338, NULL <unfinished ...> [pid 3338] <... epoll_wait resumed> {{EPOLLIN, {u32=143663400, u64=143663400}}}, 1024, 1798000) = 1 [pid 3338] accept(7, {sa_family=AF_INET, sin_port=htons(37192), sin_addr=inet_addr("137.226.164.160")}, [16]) = 14 [pid 3338] setsockopt(14, SOL_SOCKET, SO_KEEPALIVE, [1], 4) = 0 [pid 3338] setsockopt(14, SOL_TCP, TCP_NODELAY, [1], 4) = 0 [pid 3338] open("/etc/hosts.allow", O_RDONLY) = -1 ENOENT (No such file or directory) [pid 3338] open("/etc/hosts.deny", O_RDONLY) = -1 ENOENT (No such file or directory) [pid 3338] time(NULL) = 1256668099 [pid 3338] fcntl64(14, F_GETFL) = 0x2 (flags O_RDWR) [pid 3338] fcntl64(14, F_SETFL, O_RDWR|O_NONBLOCK) = 0 [pid 3338] epoll_ctl(6, EPOLL_CTL_ADD, 14, {EPOLLIN, {u32=143700808, u64=143700808}}) = 0 [pid 3338] time(NULL) = 1256668099 [pid 3338] epoll_wait(6, {{EPOLLIN, {u32=143700808, u64=143700808}}}, 1024, 1795000) = 1 [pid 3338] time(NULL) = 1256668099 [pid 3338] time(NULL) = 1256668099 [pid 3338] read(14, "\200\214\1\3\1\0c\0\0\0 "..., 11) = 11 [pid 3338] read(14, "\0\0009\0\0008\0\0005\0\0\210\0\0\207\0\0\204\0\0\26\0\0\23\0\0\n\7\0\300\0\0003"..., 131) = 131 [pid 3338] time(NULL) = 1256668099 [pid 3338] time(NULL) = 1256668099 [pid 3338] time(NULL) = 1256668099 [pid 3338] write(14, "\26\3\1\0J\2\0\0F\3\1J\347;\303\310\247w\24<\206!\334\3345\304\327\321\344\36FG\37"..., 4096) = 4096 [pid 3338] write(14, ""0\r\6\t*\206H\206\367\r\1\1\1\5\0\3\202\1\17\0000\202\1\n\2\202\1\1\0\253\v\243"..., 4096) = 4096 [pid 3338] write(14, "ootCA1\0;091\v0\t\6\3U\4\6\23\2FI1\0170\r\6\3U\4\n\23"..., 12928) = 6288 [pid 3338] write(14, "go1,0*\6\3U\4\v\23#Wells Fargo Certific"..., 6640) = -1 EAGAIN (Resource temporarily unavailable) [pid 3338] write(14, "go1,0*\6\3U\4\v\23#Wells Fargo Certific"..., 6640) = 6640 [pid 3338] write(14, "\26\3\1\24\1ck Halozatbiztonsagi Kft.1\0320"..., 5126) = 2048 [pid 3338] write(14, "\4\n\23\37Software in the Public Intere"..., 3078) = -1 EAGAIN (Resource temporarily unavailable) [pid 3338] time(NULL) = 1256668099 [pid 3338] epoll_wait(6,
(Strg-C on the client...)
{{EPOLLIN, {u32=143700808, u64=143700808}}}, 1024, 1795000) = 1 [pid 3338] time(NULL) = 1256668102 [pid 3338] time(NULL) = 1256668102 [pid 3338] write(14, "\4\n\23\37Software in the Public Intere"..., 3078) = 3078 [pid 3338] read(14, ""..., 5) = 0 [pid 3338] epoll_ctl(6, EPOLL_CTL_MOD, 14, {0, {u32=143700808, u64=143700808}}) = 0 [pid 3338] write(5, "0"..., 1) = 1 [pid 3338] epoll_ctl(6, EPOLL_CTL_DEL, 14, {0, {u32=143700808, u64=143700808}}) = 0 [pid 3338] shutdown(14, 2 /* send and receive */) = -1 ENOTCONN (Transport endpoint is not connected) [pid 3338] close(14) = 0 [pid 3338] time(NULL) = 1256668102 [pid 3338] epoll_wait(6, {{EPOLLIN, {u32=143700768, u64=143700768}}}, 1024, 1792000) = 1 [pid 3338] read(4, "0"..., 8192) = 1 [pid 3338] time(NULL) = 1256668102 [pid 3338] epoll_wait(6,
and that's it. Now when I try for the second time (now I get the server cert alright), it looks like this:
# strace -f -p `pidof slapd` Process 3354 attached with 4 threads - interrupt to quit [pid 3339] futex(0x8977560, FUTEX_WAIT_PRIVATE, 42, NULL <unfinished ...> [pid 3354] futex(0x8977560, FUTEX_WAIT_PRIVATE, 42, NULL <unfinished ...> [pid 3338] time(NULL) = 1256668222 [pid 3338] epoll_wait(6, <unfinished ...> [pid 3328] futex(0xad15ebd8, FUTEX_WAIT, 3338, NULL <unfinished ...> [pid 3338] <... epoll_wait resumed> {{EPOLLIN, {u32=143663400, u64=143663400}}}, 1024, 1672000) = 1 [pid 3338] accept(7, {sa_family=AF_INET, sin_port=htons(37195), sin_addr=inet_addr("137.226.164.160")}, [16]) = 15 [pid 3338] setsockopt(15, SOL_SOCKET, SO_KEEPALIVE, [1], 4) = 0 [pid 3338] setsockopt(15, SOL_TCP, TCP_NODELAY, [1], 4) = 0 [pid 3338] open("/etc/hosts.allow", O_RDONLY) = -1 ENOENT (No such file or directory) [pid 3338] open("/etc/hosts.deny", O_RDONLY) = -1 ENOENT (No such file or directory) [pid 3338] time(NULL) = 1256668224 [pid 3338] fcntl64(15, F_GETFL) = 0x2 (flags O_RDWR) [pid 3338] fcntl64(15, F_SETFL, O_RDWR|O_NONBLOCK) = 0 [pid 3338] epoll_ctl(6, EPOLL_CTL_ADD, 15, {EPOLLIN, {u32=143700812, u64=143700812}}) = 0 [pid 3338] time(NULL) = 1256668224 [pid 3338] epoll_wait(6, {{EPOLLIN, {u32=143700812, u64=143700812}}}, 1024, 1670000) = 1 [pid 3338] time(NULL) = 1256668224 [pid 3338] time(NULL) = 1256668224 [pid 3338] read(15, "\200\214\1\3\1\0c\0\0\0 "..., 11) = 11 [pid 3338] read(15, "\0\0009\0\0008\0\0005\0\0\210\0\0\207\0\0\204\0\0\26\0\0\23\0\0\n\7\0\300\0\0003"..., 131) = 131 [pid 3338] time(NULL) = 1256668224 [pid 3338] time(NULL) = 1256668224 [pid 3338] time(NULL) = 1256668224 [pid 3338] write(15, "\26\3\1\0J\2\0\0F\3\1J\347<@5\352%\335\336\264Q\2263\346\303\335\t\2\34\241\372Q"..., 4096) = 4096 [pid 3338] write(15, ""0\r\6\t*\206H\206\367\r\1\1\1\5\0\3\202\1\17\0000\202\1\n\2\202\1\1\0\253\v\243"..., 4096) = 4096 [pid 3338] write(15, "ootCA1\0;091\v0\t\6\3U\4\6\23\2FI1\0170\r\6\3U\4\n\23"..., 12928) = 11584 [pid 3338] write(15, "\6\3U\4\6\23\2AU1\0230\21\6\3U\4\10\23\nQueensland1\0210"..., 4096) = 2896 [pid 3338] write(15, "ty1$0"\6\3U\4\n\23\33Digital Signature Tr"..., 1200) = 1200 [pid 3338] write(15, "\26\31personal-basic@thawte.com\0\3210\201\3161"..., 2374) = 2374 [pid 3338] read(15, 0x8a08398, 5) = -1 EAGAIN (Resource temporarily unavailable) [pid 3338] time(NULL) = 1256668224 [pid 3338] epoll_wait(6, {{EPOLLIN, {u32=143700812, u64=143700812}}}, 1024, 1670000) = 1 [pid 3338] time(NULL) = 1256668224 [pid 3338] time(NULL) = 1256668224 [pid 3338] read(15, "\26\3\1\0\7"..., 5) = 5 [pid 3338] read(15, "\v\0\0\3\0\0\0"..., 7) = 7 [pid 3338] read(15, "\26\3\1\1\6"..., 5) = 5 [pid 3338] read(15, "\20\0\1\2\1\0~\246\237\364\202\0\217\345#|\241\273k\34\251\277\224X\346\274\361\300\373\1\24\226\334"..., 262) = 262 [pid 3338] read(15, "\24\3\1\0\1"..., 5) = 5 [pid 3338] read(15, "\1"..., 1) = 1 [pid 3338] read(15, "\26\3\1\0000"..., 5) = 5 [pid 3338] read(15, "\36\337\371\314\260\5\246\233\17\31^P\3027\227\333\257\374\221F\\20?1\316\207\201BJQ\337\264\224"..., 48) = 48 [pid 3338] write(15, "\24\3\1\0\1\1\26\3\1\0000\356\336\3673\3034w\344\3364e\264\10dP\302\205\3058\357\272c"..., 59) = 59 [pid 3338] time(NULL) = 1256668224 [pid 3338] epoll_wait(6,
Hope that someone can make sense of this. Just to be clear: ldapsearch behaves the same way as described above for openssl s_client.
Thank you very much for even reading so far.