Simon Josefsson wrote:
Howard Chuhyc@symas.com writes:
The recent trouble in ITS#5361 prompted me to look into the GnuTLS code a little deeper. It turns out that their corresponding set_subject_alt_name() API only takes a char * pointer as input, without a corresponding length. As such, this API will only work for string-form alternative names, and will typically break with IP addresses and other alternatives.
I've assigned this a ticket:
http://trac.gnutls.org/cgi-bin/trac.cgi/ticket/18
It seems most SAN's in wide use are string-like, so nobody had discovered this before. It seems clear gnutls should have a new API to allow setting arbitrary SAN values.
Not really trying to be antagonistic here, but this kind of proves my point about inexperience - SANs are pretty well documented in X.509v3. (Heck, if there's anything you can count on in ITU specs, it's that there are mountains of them.) Anyone who has spent any time with ASN.1 and X.500 knows that your safest, most uniform coding approach is to use explicit (length,data) tuples everywhere. Uniformity in the API design prevents this sort of problem ever occurring in the first place.
Your bug report text is interesting because it's only concerned with truncated binary values. There's a flip side to this problem that you didn't mention, which is the potential to SEGV from accessing unmapped memory due to use of a non-terminated string. And again, an experienced C programmer would understand this issue without needing to be bludgeoned over the head with it (as I am certainly beating you up about it now).
There's another issue here, due to usability. Anyone who has spent time with X.509 certificates has run into the problem of needing to specify multiple SANs to get their certificates working. Your current API only allows a single SAN value to be stored. Again, the API appears to have been designed by somebody who has never used these concepts in the real world, and doesn't understand how these items actually need to work. That's hardly reassuring in the general case, doubly worrying for a piece of security software.
Looking across more of their APIs, I see that the code makes liberal use of strlen and strcat, when it needs to be using counted-length data blobs everywhere. In short, the code is fundamentally broken; most of its external and internal APIs are incapable of passing binary data without mangling it. The code is completely unsafe for handling binary data, and yet the nature of TLS processing is almost entirely dependent on secure handling of binary data.
I strongly recommend that GnuTLS not be used. All of its APIs would need to be overhauled to correct its flaws and it's clear that the developers there are too naive and inexperienced to even understand that it's broken.
I looked at the X.509 API's (x509.h) and I couldn't find any other that didn't take buffer length arguments. I didn't look carefully though.
There is 1 (one) use of 'strcat' in the X.509 code, and it looks correct to me. There was 20 uses of 'strlen' in the X.509 code, and I went over the first matches but when they looked correct I didn't look further. (For reference, the X.509 code size is around 21000 lines of code.)
If you can give more details, that would be appreciated.
Aside from the inherently unsafe nature of using strlen() on strings of unknown origin, the overall code quality is extremely poor. It looks like it was written by a Pascal programmer, someone who's accustomed to strings with embedded lengths and for which strlen() is essentially free. I hesitate to mention this aspect of it since you'll probably consider it a premature optimization.
For example, minitasn1/coding.c: asn1_retCode _asn1_time_der (unsigned char *str, unsigned char *der, int *der_len) { int len_len; int max_len;
max_len = *der_len;
asn1_length_der (strlen (str), (max_len > 0) ? der : NULL, &len_len);
if ((len_len + (int) strlen (str)) <= max_len) memcpy (der + len_len, str, strlen (str)); *der_len = len_len + strlen (str);
if ((*der_len) > max_len) return ASN1_MEM_ERROR;
return ASN1_SUCCESS; }
There are 4 calls to strlen(str) here, which ... I can't even find the words to express how gross this is.
You note that there's really a small number of instances of strcat() in the code. That's true, but that's because you've provided your own _gnutls_str_cat() function instead, which is also heavily used. Assuming that strlen() isn't going to SEGV on you (which depends on dumb luck) this becomes just a question of efficiency.
E.g. in x509/dn.c you have in str_escape() str_length = MIN (strlen (str), buffer_size - 1); which is already bad because the MIN macro may evaluate that strlen() call twice.
This function is called from _gnutls_x509_parse_dn() which makes heavy use of _gnutls_string_append_str() which guess what, calls strlen() on its arguments again. All of your string processing code has O(n^2) efficiency. It's incredibly sloppy and there's no justifiable reason for it.
In x509/output.c your gnutls_x509_crt_print() function constructs a gnutls_string structure with a series of inefficient calls and then, the final capper: _gnutls_string_append_data (&str, "\0", 1); out->data = str.data; out->size = strlen (str.data); It walks all across the length of the string to append a single terminating NUL byte, and then it calls strlen() on the result *again*, even though str.length is already calculated for you in _gnutls_string_append_data().
If you actually had used gnutls_string structures everywhere, you *could* do all string construction operations in O(n) time (n ~ number of characters being added) by simply memcpy'ing to str.data+str.length. As it is, anything based on strcat is inherently O(n^2) and you do so many repeated traversals of the same data that I suspect the actual compute cost is even higher than n^2.
Howard Chu hyc@symas.com writes:
Simon Josefsson wrote:
Howard Chuhyc@symas.com writes:
The recent trouble in ITS#5361 prompted me to look into the GnuTLS code a little deeper. It turns out that their corresponding set_subject_alt_name() API only takes a char * pointer as input, without a corresponding length. As such, this API will only work for string-form alternative names, and will typically break with IP addresses and other alternatives.
I've assigned this a ticket:
http://trac.gnutls.org/cgi-bin/trac.cgi/ticket/18
It seems most SAN's in wide use are string-like, so nobody had discovered this before. It seems clear gnutls should have a new API to allow setting arbitrary SAN values.
Not really trying to be antagonistic here, but this kind of proves my point about inexperience - SANs are pretty well documented in X.509v3. (Heck, if there's anything you can count on in ITU specs, it's that there are mountains of them.) Anyone who has spent any time with ASN.1 and X.500 knows that your safest, most uniform coding approach is to use explicit (length,data) tuples everywhere. Uniformity in the API design prevents this sort of problem ever occurring in the first place.
Agreed. As far as I can see, the rest of the GnuTLS API have that uniformity. This particular function, gnutls_x509_crt_set_subject_alternative_name, is odd in other ways too (the naming standard, elsewhere 'subject_alt_name' is used).
Your bug report text is interesting because it's only concerned with truncated binary values. There's a flip side to this problem that you didn't mention, which is the potential to SEGV from accessing unmapped memory due to use of a non-terminated string. And again, an experienced C programmer would understand this issue without needing to be bludgeoned over the head with it (as I am certainly beating you up about it now).
Since the function lacks a string length parameter, I think it is obvious that the function is intended to take a zero terminated string. This is also how the command line tools (src/certtool.c etc) use it.
I don't think it is unreasonable for a SAN related API to work with zero-terminated strings. The typical SAN's like dNSName, rfc822Name, and uniformResourceIdentifier are human readable strings. Most applications will work with the strings in zero-terminated form.
Given that there are more SAN's that aren't string-like, I agree that we should have an API to support them. But I think the current API that takes a zero terminated string will continue to be useful.
So, yes, there is a SEGV if you pass the function non-zero-terminated data, but strlen has the same problem... So it is about documentation. I have improved the documentation.
There's another issue here, due to usability. Anyone who has spent time with X.509 certificates has run into the problem of needing to specify multiple SANs to get their certificates working. Your current API only allows a single SAN value to be stored. Again, the API appears to have been designed by somebody who has never used these concepts in the real world, and doesn't understand how these items actually need to work. That's hardly reassuring in the general case, doubly worrying for a piece of security software.
I've added this as a requirement on the new API to the ticket: http://trac.gnutls.org/cgi-bin/trac.cgi/ticket/18#comment:1
Looking across more of their APIs, I see that the code makes liberal use of strlen and strcat, when it needs to be using counted-length data blobs everywhere. In short, the code is fundamentally broken; most of its external and internal APIs are incapable of passing binary data without mangling it. The code is completely unsafe for handling binary data, and yet the nature of TLS processing is almost entirely dependent on secure handling of binary data.
I strongly recommend that GnuTLS not be used. All of its APIs would need to be overhauled to correct its flaws and it's clear that the developers there are too naive and inexperienced to even understand that it's broken.
I looked at the X.509 API's (x509.h) and I couldn't find any other that didn't take buffer length arguments. I didn't look carefully though.
There is 1 (one) use of 'strcat' in the X.509 code, and it looks correct to me. There was 20 uses of 'strlen' in the X.509 code, and I went over the first matches but when they looked correct I didn't look further. (For reference, the X.509 code size is around 21000 lines of code.)
If you can give more details, that would be appreciated.
Aside from the inherently unsafe nature of using strlen() on strings of unknown origin, the overall code quality is extremely poor. It looks like it was written by a Pascal programmer, someone who's accustomed to strings with embedded lengths and for which strlen() is essentially free. I hesitate to mention this aspect of it since you'll probably consider it a premature optimization.
For example, minitasn1/coding.c: asn1_retCode _asn1_time_der (unsigned char *str, unsigned char *der, int *der_len) { int len_len; int max_len;
max_len = *der_len;
asn1_length_der (strlen (str), (max_len > 0) ? der : NULL, &len_len);
if ((len_len + (int) strlen (str)) <= max_len) memcpy (der + len_len, str, strlen (str)); *der_len = len_len + strlen (str);
if ((*der_len) > max_len) return ASN1_MEM_ERROR;
return ASN1_SUCCESS; }
There are 4 calls to strlen(str) here, which ... I can't even find the words to express how gross this is.
That code is from libtasn1, and for libtasn1 I am inclined to agree with you even generally -- libtasn1 is not written in good C style. It is very difficult to maintain, and I try to avoid changing the code if I can. However, the API is small and reasonable, and the code is not that large, so it is possible to re-write it in a clean way. There is even a item in doc/TODO about this:
* Improve or rewrite libtasn1 to make it easier to maintain.
Right now we don't have resources to take on something like that.
Although I agree that libtasn1 code is ugly, I think my reasons are somewhat different than yours -- I think the code is unreadable and unmaintainable. I regard unreadable code as a much more serious problem than inefficient code. It is not at all clear that libtasn1 is a bottle-neck performance wise. GnuTLS is used on some rather limited devices and performance haven't been a significant concern. So if the code was very readable, but still inefficient, I wouldn't have any serious problem with it. (Best is of course readable and efficient code, and sometimes those two go together, but not always.) A good compiler could even optimize away those excessive strlen (str) calls.
You note that there's really a small number of instances of strcat() in the code. That's true, but that's because you've provided your own _gnutls_str_cat() function instead, which is also heavily used. Assuming that strlen() isn't going to SEGV on you (which depends on dumb luck) this becomes just a question of efficiency.
_gnutls_str_cat is used in 61 places in 80kloc (v2.3.2), and it does protect against overflows. Looking over the uses, many of them are to construct the ASN.1 field names, which are fixed strings stored in local fixed buffers, rather than working on variable-size inputs. The code looks ugly to me, but seems correct in the few places I looked carefully at.
E.g. in x509/dn.c you have in str_escape() str_length = MIN (strlen (str), buffer_size - 1); which is already bad because the MIN macro may evaluate that strlen() call twice.
This function is called from _gnutls_x509_parse_dn() which makes heavy use of _gnutls_string_append_str() which guess what, calls strlen() on its arguments again. All of your string processing code has O(n^2) efficiency. It's incredibly sloppy and there's no justifiable reason for it.
In x509/output.c your gnutls_x509_crt_print() function constructs a gnutls_string structure with a series of inefficient calls and then, the final capper: _gnutls_string_append_data (&str, "\0", 1); out->data = str.data; out->size = strlen (str.data); It walks all across the length of the string to append a single terminating NUL byte, and then it calls strlen() on the result *again*, even though str.length is already calculated for you in _gnutls_string_append_data().
If you actually had used gnutls_string structures everywhere, you *could* do all string construction operations in O(n) time (n ~ number of characters being added) by simply memcpy'ing to str.data+str.length. As it is, anything based on strcat is inherently O(n^2) and you do so many repeated traversals of the same data that I suspect the actual compute cost is even higher than n^2.
Yeah, there are large parts that could be optimized, and you have found good examples.
I think the X.509 code in GnuTLS is of less quality than the rest, and it would be interesting to see if we can switch to using something like libksba. The goal with GnuTLS is to be a SSL/TLS implementation, and if there had been good X.509 libraries around I suspect we wouldn't have incrementally added code to handle X.509 at all. I wasn't involved in the project at the time the X.509 code was written so I don't know what the thoughts were, though.
Still, the X.509 part of a TLS session is small. We take care to do the expensive processing before the actual TLS session start. (Such as computing the RDN_SEQ to send to clients.) My hope is that GnuTLS will do minimal X.509 handling in the future. We support PKCS#11-like handling of private keys, and applications can off-load the X.509 chain verification to other libraries as well.
In the end, this is about economics and trade-offs. While the code is technically sometimes both inefficient and inelegant, there are too few people who work on it to make re-writing code a good use of our time. If GnuTLS was a larger and funded project like OpenSSL, NSS, or OpenLDAP, things may be different.
/Simon
Simon Josefsson wrote:
In the end, this is about economics and trade-offs. While the code is technically sometimes both inefficient and inelegant, there are too few people who work on it to make re-writing code a good use of our time. If GnuTLS was a larger and funded project like OpenSSL, NSS, or OpenLDAP, things may be different.
I'll note that when I embarked on performance analysis of OpenLDAP back in 2001, http://www.openldap.org/lists/openldap-devel/200109/msg00065.html all of my work was funded by my bank account. There was pretty much no commercial interest in OpenLDAP until several years later, long after our major performance overhauls were done. After we had consistently demonstrated that the rewritten codebase performed better than anything else out there. The Project's growth didn't really accelerate until after that groundwork was done.
If you don't build a tight foundation you can't build anything good on top. If you don't build the foundation yourself, no one will ever care to fund you to add anything else. Just my opinion of course.
Simon,
I know *very* little about C programming but...
Simon Josefsson wrote:
I don't think it is unreasonable for a SAN related API to work with zero-terminated strings. The typical SAN's like dNSName, rfc822Name, and uniformResourceIdentifier are human readable strings. Most applications will work with the strings in zero-terminated form.
...having implemented a cert parser in Python I'd like to emphasize that the attitude of "Most applications will work" is for me a real show-stopper for deploying GnuTLS especially regarding possible security issues.
In my project experience I saw so many PKI-enabled software packages crashing while handling even perfectly valid certificates (not to speak of mal-formed certs issued by some commercial CAs).
Ciao, Michael.
Michael Ströder michael@stroeder.com writes:
Simon,
I know *very* little about C programming but...
Simon Josefsson wrote:
I don't think it is unreasonable for a SAN related API to work with zero-terminated strings. The typical SAN's like dNSName, rfc822Name, and uniformResourceIdentifier are human readable strings. Most applications will work with the strings in zero-terminated form.
...having implemented a cert parser in Python I'd like to emphasize that the attitude of "Most applications will work" is for me a real show-stopper for deploying GnuTLS especially regarding possible security issues.
In my project experience I saw so many PKI-enabled software packages crashing while handling even perfectly valid certificates (not to speak of mal-formed certs issued by some commercial CAs).
Sure, but this particular function takes a zero terminated string, and I believe that this is the simplest for most applications. Hostnames and email addresses are typical things to prompt the user for (using APIs like 'readline', which is what the certtool command line tool does), and such strings are typically stored in a zero terminated string in C applications.
For other SAN's that are binary data, and for applications that use length-value-pairs for the data, I agree that there should be another API as well. There is a ticket for this, and we'll add it as soon as possible. I appreciate that this thread has brought up problems that we can fix.
/Simon