Hello,
I'm seeing some really weird behaviour when using ldaps:// on an
openldap-2.3.43 server. It's a Gentoo Linux box with
glibc-2.9_p20081201-r2 and openssl-0.9.8k. I have already recompiled the
entire system with gcc-4.3.4 (twice to be sure), with no errors. First
of all, ldapsearch -H ldaps://bussard.lih.rwth-aachen.de just hangs.
The strange part: when I strace -f slapd, from the second retry on, it
works.
So I went on by debugging with openssl s_client, which exhibits just the
same behaviour. However it reveals that slapd falls silent in the middle
of sending the certificates.
So if I do:
$ openssl s_client -connect bussard.lih.rwth-aachen.de:636 -state
-status -CAfile /etc/openldap/ssl/rwth-dfn-tcom.crt
CONNECTED(00000003)
SSL_connect:before/connect initialization
SSL_connect:SSLv2/v3 write client hello A
OCSP response: no response sent
SSL_connect:SSLv3 read server hello A
depth=3 /C=DE/O=Deutsche Telekom AG/OU=T-TeleSec Trust
Center/CN=Deutsche Telekom Root CA 2
verify return:1
depth=2 /C=DE/O=DFN-Verein/OU=DFN-PKI/CN=DFN-Verein PCA Global - G01
verify return:1
depth=1 /C=DE/O=RWTH Aachen/CN=RWTH Aachen CA/emailAddress=ca(a)rwth-aachen.de
verify return:1
depth=0 /C=DE/O=RWTH Aachen/OU=Lehrstuhl fuer Ingenieur- und
Hydrogeologie/CN=ldap.lih.rwth-aachen.de
verify return:1
SSL_connect:SSLv3 read server certificate A
^C
Now after I've done "strace -f -p `pidof slapd`" on the server, I get
the same as above once. Then when I try a second time:
$ openssl s_client -connect bussard.lih.rwth-aachen.de:636 -state
-CAfile /etc/openldap/ssl/rwth-dfn-tcom.crt
CONNECTED(00000003)
SSL_connect:before/connect initialization
SSL_connect:SSLv2/v3 write client hello A
SSL_connect:SSLv3 read server hello A
depth=3 /C=DE/O=Deutsche Telekom AG/OU=T-TeleSec Trust
Center/CN=Deutsche Telekom Root CA 2
verify return:1
depth=2 /C=DE/O=DFN-Verein/OU=DFN-PKI/CN=DFN-Verein PCA Global - G01
verify return:1
depth=1 /C=DE/O=RWTH Aachen/CN=RWTH Aachen CA/emailAddress=ca(a)rwth-aachen.de
verify return:1
depth=0 /C=DE/O=RWTH Aachen/OU=Lehrstuhl fuer Ingenieur- und
Hydrogeologie/CN=ldap.lih.rwth-aachen.de
verify return:1
SSL_connect:SSLv3 read server certificate A
SSL_connect:SSLv3 read server certificate request A
SSL_connect:SSLv3 read server done A
SSL_connect:SSLv3 write client certificate A
SSL_connect:SSLv3 write client key exchange A
SSL_connect:SSLv3 write change cipher spec A
SSL_connect:SSLv3 write finished A
SSL_connect:SSLv3 flush data
SSL_connect:SSLv3 read finished A
---
Certificate chain
0 s:/C=DE/O=RWTH Aachen/OU=Lehrstuhl fuer Ingenieur- und
Hydrogeologie/CN=ldap.lih.rwth-aachen.de
i:/C=DE/O=RWTH Aachen/CN=RWTH Aachen CA/emailAddress=ca(a)rwth-aachen.de
1 s:/C=DE/O=RWTH Aachen/CN=RWTH Aachen CA/emailAddress=ca(a)rwth-aachen.de
i:/C=DE/O=DFN-Verein/OU=DFN-PKI/CN=DFN-Verein PCA Global - G01
2 s:/C=DE/O=DFN-Verein/OU=DFN-PKI/CN=DFN-Verein PCA Global - G01
i:/C=DE/O=Deutsche Telekom AG/OU=T-TeleSec Trust Center/CN=Deutsche
Telekom Root CA 2
3 s:/C=DE/O=Deutsche Telekom AG/OU=T-TeleSec Trust Center/CN=Deutsche
Telekom Root CA 2
i:/C=DE/O=Deutsche Telekom AG/OU=T-TeleSec Trust Center/CN=Deutsche
Telekom Root CA 2
---
Server certificate
-----BEGIN CERTIFICATE-----
... (lotsa stuff) ...
/C=ES/ST=Barcelona/L=Barcelona/O=IPS Internet publishing Services
s.l./O=ips(a)mail.ips.es C.I.F. B-60929452/OU=IPS CA Timestamping
Certification Authority/CN=IPS CA Timestamping Certification
Authority/emailAddress=ips(a)mail.ips.es
---
SSL handshake has read 26305 bytes and written 480 bytes
---
New, TLSv1/SSLv3, Cipher is AES256-SHA
Server public key is 2048 bit
Compression: NONE
Expansion: NONE
SSL-Session:
Protocol : TLSv1
Cipher : AES256-SHA
Session-ID: 99...CB400
Session-ID-ctx:
Master-Key: 152...A2D
Key-Arg : None
Start Time: 1256667603
Timeout : 300 (sec)
Verify return code: 0 (ok)
---
This time the SSL handshake works, just because I'm strace'ing slapd?
This looks like some really weird race condition. It's driving me crazy.
Should I talk to the openssl people about this? But when I make an
openssl testbed with openssl s_server and s_client, everything works
fine, so it shouldn't be an openssl issue.
Oh and maybe you'd like to see the strace output. This is what it looks
like when the SSL client hangs:
# strace -f -p `pidof slapd`
Process 3339 attached with 3 threads - interrupt to quit
[pid 3339] futex(0x8977560, FUTEX_WAIT_PRIVATE, 1, NULL <unfinished ...>
[pid 3338] time(NULL) = 1256668096
[pid 3338] epoll_wait(6, <unfinished ...>
[pid 3328] futex(0xad15ebd8, FUTEX_WAIT, 3338, NULL <unfinished ...>
[pid 3338] <... epoll_wait resumed> {{EPOLLIN, {u32=143663400,
u64=143663400}}}, 1024, 1798000) = 1
[pid 3338] accept(7, {sa_family=AF_INET, sin_port=htons(37192),
sin_addr=inet_addr("137.226.164.160")}, [16]) = 14
[pid 3338] setsockopt(14, SOL_SOCKET, SO_KEEPALIVE, [1], 4) = 0
[pid 3338] setsockopt(14, SOL_TCP, TCP_NODELAY, [1], 4) = 0
[pid 3338] open("/etc/hosts.allow", O_RDONLY) = -1 ENOENT (No such file
or directory)
[pid 3338] open("/etc/hosts.deny", O_RDONLY) = -1 ENOENT (No such file
or directory)
[pid 3338] time(NULL) = 1256668099
[pid 3338] fcntl64(14, F_GETFL) = 0x2 (flags O_RDWR)
[pid 3338] fcntl64(14, F_SETFL, O_RDWR|O_NONBLOCK) = 0
[pid 3338] epoll_ctl(6, EPOLL_CTL_ADD, 14, {EPOLLIN, {u32=143700808,
u64=143700808}}) = 0
[pid 3338] time(NULL) = 1256668099
[pid 3338] epoll_wait(6, {{EPOLLIN, {u32=143700808, u64=143700808}}},
1024, 1795000) = 1
[pid 3338] time(NULL) = 1256668099
[pid 3338] time(NULL) = 1256668099
[pid 3338] read(14, "\200\214\1\3\1\0c\0\0\0 "..., 11) = 11
[pid 3338] read(14,
"\0\0009\0\0008\0\0005\0\0\210\0\0\207\0\0\204\0\0\26\0\0\23\0\0\n\7\0\300\0\0003"...,
131) = 131
[pid 3338] time(NULL) = 1256668099
[pid 3338] time(NULL) = 1256668099
[pid 3338] time(NULL) = 1256668099
[pid 3338] write(14,
"\26\3\1\0J\2\0\0F\3\1J\347;\303\310\247w\24<\206!\334\3345\304\327\321\344\36FG\37"...,
4096) = 4096
[pid 3338] write(14,
"\"0\r\6\t*\206H\206\367\r\1\1\1\5\0\3\202\1\17\0000\202\1\n\2\202\1\1\0\253\v\243"...,
4096) = 4096
[pid 3338] write(14,
"ootCA1\0;091\v0\t\6\3U\4\6\23\2FI1\0170\r\6\3U\4\n\23"..., 12928) = 6288
[pid 3338] write(14, "go1,0*\6\3U\4\v\23#Wells Fargo Certific"...,
6640) = -1 EAGAIN (Resource temporarily unavailable)
[pid 3338] write(14, "go1,0*\6\3U\4\v\23#Wells Fargo Certific"...,
6640) = 6640
[pid 3338] write(14, "\26\3\1\24\1ck Halozatbiztonsagi Kft.1\0320"...,
5126) = 2048
[pid 3338] write(14, "\4\n\23\37Software in the Public Intere"...,
3078) = -1 EAGAIN (Resource temporarily unavailable)
[pid 3338] time(NULL) = 1256668099
[pid 3338] epoll_wait(6,
(Strg-C on the client...)
{{EPOLLIN, {u32=143700808, u64=143700808}}}, 1024, 1795000) = 1
[pid 3338] time(NULL) = 1256668102
[pid 3338] time(NULL) = 1256668102
[pid 3338] write(14, "\4\n\23\37Software in the Public Intere"...,
3078) = 3078
[pid 3338] read(14, ""..., 5) = 0
[pid 3338] epoll_ctl(6, EPOLL_CTL_MOD, 14, {0, {u32=143700808,
u64=143700808}}) = 0
[pid 3338] write(5, "0"..., 1) = 1
[pid 3338] epoll_ctl(6, EPOLL_CTL_DEL, 14, {0, {u32=143700808,
u64=143700808}}) = 0
[pid 3338] shutdown(14, 2 /* send and receive */) = -1 ENOTCONN
(Transport endpoint is not connected)
[pid 3338] close(14) = 0
[pid 3338] time(NULL) = 1256668102
[pid 3338] epoll_wait(6, {{EPOLLIN, {u32=143700768, u64=143700768}}},
1024, 1792000) = 1
[pid 3338] read(4, "0"..., 8192) = 1
[pid 3338] time(NULL) = 1256668102
[pid 3338] epoll_wait(6,
and that's it. Now when I try for the second time (now I get the server
cert alright), it looks like this:
# strace -f -p `pidof slapd`
Process 3354 attached with 4 threads - interrupt to quit
[pid 3339] futex(0x8977560, FUTEX_WAIT_PRIVATE, 42, NULL <unfinished ...>
[pid 3354] futex(0x8977560, FUTEX_WAIT_PRIVATE, 42, NULL <unfinished ...>
[pid 3338] time(NULL) = 1256668222
[pid 3338] epoll_wait(6, <unfinished ...>
[pid 3328] futex(0xad15ebd8, FUTEX_WAIT, 3338, NULL <unfinished ...>
[pid 3338] <... epoll_wait resumed> {{EPOLLIN, {u32=143663400,
u64=143663400}}}, 1024, 1672000) = 1
[pid 3338] accept(7, {sa_family=AF_INET, sin_port=htons(37195),
sin_addr=inet_addr("137.226.164.160")}, [16]) = 15
[pid 3338] setsockopt(15, SOL_SOCKET, SO_KEEPALIVE, [1], 4) = 0
[pid 3338] setsockopt(15, SOL_TCP, TCP_NODELAY, [1], 4) = 0
[pid 3338] open("/etc/hosts.allow", O_RDONLY) = -1 ENOENT (No such file
or directory)
[pid 3338] open("/etc/hosts.deny", O_RDONLY) = -1 ENOENT (No such file
or directory)
[pid 3338] time(NULL) = 1256668224
[pid 3338] fcntl64(15, F_GETFL) = 0x2 (flags O_RDWR)
[pid 3338] fcntl64(15, F_SETFL, O_RDWR|O_NONBLOCK) = 0
[pid 3338] epoll_ctl(6, EPOLL_CTL_ADD, 15, {EPOLLIN, {u32=143700812,
u64=143700812}}) = 0
[pid 3338] time(NULL) = 1256668224
[pid 3338] epoll_wait(6, {{EPOLLIN, {u32=143700812, u64=143700812}}},
1024, 1670000) = 1
[pid 3338] time(NULL) = 1256668224
[pid 3338] time(NULL) = 1256668224
[pid 3338] read(15, "\200\214\1\3\1\0c\0\0\0 "..., 11) = 11
[pid 3338] read(15,
"\0\0009\0\0008\0\0005\0\0\210\0\0\207\0\0\204\0\0\26\0\0\23\0\0\n\7\0\300\0\0003"...,
131) = 131
[pid 3338] time(NULL) = 1256668224
[pid 3338] time(NULL) = 1256668224
[pid 3338] time(NULL) = 1256668224
[pid 3338] write(15,
"\26\3\1\0J\2\0\0F\3\1J\347<@5\352%\335\336\264Q\2263\346\303\335\t\2\34\241\372Q"...,
4096) = 4096
[pid 3338] write(15,
"\"0\r\6\t*\206H\206\367\r\1\1\1\5\0\3\202\1\17\0000\202\1\n\2\202\1\1\0\253\v\243"...,
4096) = 4096
[pid 3338] write(15,
"ootCA1\0;091\v0\t\6\3U\4\6\23\2FI1\0170\r\6\3U\4\n\23"..., 12928) = 11584
[pid 3338] write(15,
"\6\3U\4\6\23\2AU1\0230\21\6\3U\4\10\23\nQueensland1\0210"..., 4096) = 2896
[pid 3338] write(15, "ty1$0\"\6\3U\4\n\23\33Digital Signature Tr"...,
1200) = 1200
[pid 3338] write(15,
"\26\31personal-basic(a)thawte.com\0\3210\201\3161"..., 2374) = 2374
[pid 3338] read(15, 0x8a08398, 5) = -1 EAGAIN (Resource
temporarily unavailable)
[pid 3338] time(NULL) = 1256668224
[pid 3338] epoll_wait(6, {{EPOLLIN, {u32=143700812, u64=143700812}}},
1024, 1670000) = 1
[pid 3338] time(NULL) = 1256668224
[pid 3338] time(NULL) = 1256668224
[pid 3338] read(15, "\26\3\1\0\7"..., 5) = 5
[pid 3338] read(15, "\v\0\0\3\0\0\0"..., 7) = 7
[pid 3338] read(15, "\26\3\1\1\6"..., 5) = 5
[pid 3338] read(15,
"\20\0\1\2\1\0~\246\237\364\202\0\217\345#|\241\273k\34\251\277\224X\346\274\361\300\373\1\24\226\334"...,
262) = 262
[pid 3338] read(15, "\24\3\1\0\1"..., 5) = 5
[pid 3338] read(15, "\1"..., 1) = 1
[pid 3338] read(15, "\26\3\1\0000"..., 5) = 5
[pid 3338] read(15,
"\36\337\371\314\260\5\246\233\17\31^P\3027\227\333\257\374\221F\\\20?1\316\207\201BJQ\337\264\224"...,
48) = 48
[pid 3338] write(15,
"\24\3\1\0\1\1\26\3\1\0000\356\336\3673\3034w\344\3364e\264\10dP\302\205\3058\357\272c"...,
59) = 59
[pid 3338] time(NULL) = 1256668224
[pid 3338] epoll_wait(6,
Hope that someone can make sense of this. Just to be clear: ldapsearch
behaves the same way as described above for openssl s_client.
Thank you very much for even reading so far.