Re: GnuTLS considered harmful

27 Feb 2008


      Howard Chu hyc@symas.com writes:
...
Simon Josefsson wrote:
...
Howard Chuhyc@symas.com  writes:
...
The recent trouble in ITS#5361 prompted me to look into the GnuTLS
code a little deeper. It turns out that their corresponding
set_subject_alt_name() API only takes a char * pointer as input,
without a corresponding length. As such, this API will only work for
string-form alternative names, and will typically break with IP
addresses and other alternatives.
I've assigned this a ticket:
http://trac.gnutls.org/cgi-bin/trac.cgi/ticket/18
It seems most SAN's in wide use are string-like, so nobody had
discovered this before.  It seems clear gnutls should have a new API to
allow setting arbitrary SAN values.
Not really trying to be antagonistic here, but this kind of proves my
point about inexperience - SANs are pretty well documented in
X.509v3. (Heck, if there's anything you can count on in ITU specs,
it's that there are mountains of them.) Anyone who has spent any time
with ASN.1 and X.500 knows that your safest, most uniform coding
approach is to use explicit (length,data) tuples
everywhere. Uniformity in the API design prevents this sort of problem
ever occurring in the first place.
Agreed.  As far as I can see, the rest of the GnuTLS API have that
uniformity.  This particular function,
gnutls_x509_crt_set_subject_alternative_name, is odd in other ways too
(the naming standard, elsewhere 'subject_alt_name' is used).
...
Your bug report text is interesting because it's only concerned with
truncated binary values. There's a flip side to this problem that you
didn't mention, which is the potential to SEGV from accessing unmapped
memory due to use of a non-terminated string. And again, an
experienced C programmer would understand this issue without needing
to be bludgeoned over the head with it (as I am certainly beating you
up about it now).
Since the function lacks a string length parameter, I think it is
obvious that the function is intended to take a zero terminated string.
This is also how the command line tools (src/certtool.c etc) use it.
I don't think it is unreasonable for a SAN related API to work with
zero-terminated strings.  The typical SAN's like dNSName, rfc822Name,
and uniformResourceIdentifier are human readable strings.  Most
applications will work with the strings in zero-terminated form.
Given that there are more SAN's that aren't string-like, I agree that we
should have an API to support them.  But I think the current API that
takes a zero terminated string will continue to be useful.
So, yes, there is a SEGV if you pass the function non-zero-terminated
data, but strlen has the same problem...  So it is about documentation.
I have improved the documentation.
...
There's another issue here, due to usability. Anyone who has spent
time with X.509 certificates has run into the problem of needing to
specify multiple SANs to get their certificates working. Your current
API only allows a single SAN value to be stored. Again, the API
appears to have been designed by somebody who has never used these
concepts in the real world, and doesn't understand how these items
actually need to work. That's hardly reassuring in the general case,
doubly worrying for a piece of security software.
I've added this as a requirement on the new API to the ticket:
http://trac.gnutls.org/cgi-bin/trac.cgi/ticket/18#comment:1
...
...
...
Looking across more of their APIs, I see that the code makes liberal
use of strlen and strcat, when it needs to be using counted-length
data blobs everywhere. In short, the code is fundamentally broken;
most of its external and internal APIs are incapable of passing binary
data without mangling it. The code is completely unsafe for handling
binary data, and yet the nature of TLS processing is almost entirely
dependent on secure handling of binary data.
I strongly recommend that GnuTLS not be used. All of its APIs would
need to be overhauled to correct its flaws and it's clear that the
developers there are too naive and inexperienced to even understand
that it's broken.
I looked at the X.509 API's (x509.h) and I couldn't find any other that
didn't take buffer length arguments.  I didn't look carefully though.
There is 1 (one) use of 'strcat' in the X.509 code, and it looks correct
to me.  There was 20 uses of 'strlen' in the X.509 code, and I went over
the first matches but when they looked correct I didn't look further.
(For reference, the X.509 code size is around 21000 lines of code.)
...
If you can give more details, that would be appreciated.
Aside from the inherently unsafe nature of using strlen() on strings
of unknown origin, the overall code quality is extremely poor. It
looks like it was written by a Pascal programmer, someone who's
accustomed to strings with embedded lengths and for which strlen() is
essentially free. I hesitate to mention this aspect of it since you'll
probably consider it a premature optimization.
For example, minitasn1/coding.c:
asn1_retCode
_asn1_time_der (unsigned char *str, unsigned char *der, int *der_len)
{
  int len_len;
  int max_len;
max_len = *der_len;
asn1_length_der (strlen (str), (max_len > 0) ? der : NULL, &len_len);
if ((len_len + (int) strlen (str)) <= max_len)
    memcpy (der + len_len, str, strlen (str));
  *der_len = len_len + strlen (str);
if ((*der_len) > max_len)
    return ASN1_MEM_ERROR;
return ASN1_SUCCESS;
}
There are 4 calls to strlen(str) here, which ... I can't even find the
words to express how gross this is.
That code is from libtasn1, and for libtasn1 I am inclined to agree with
you even generally -- libtasn1 is not written in good C style.  It is
very difficult to maintain, and I try to avoid changing the code if I
can.  However, the API is small and reasonable, and the code is not that
large, so it is possible to re-write it in a clean way.  There is even a
item in doc/TODO about this:
* Improve or rewrite libtasn1 to make it easier to maintain.
Right now we don't have resources to take on something like that.
Although I agree that libtasn1 code is ugly, I think my reasons are
somewhat different than yours -- I think the code is unreadable and
unmaintainable.  I regard unreadable code as a much more serious problem
than inefficient code.  It is not at all clear that libtasn1 is a
bottle-neck performance wise.  GnuTLS is used on some rather limited
devices and performance haven't been a significant concern.  So if the
code was very readable, but still inefficient, I wouldn't have any
serious problem with it.  (Best is of course readable and efficient
code, and sometimes those two go together, but not always.)  A good
compiler could even optimize away those excessive strlen (str) calls.
...
You note that there's really a small number of instances of strcat()
in the code. That's true, but that's because you've provided your own
_gnutls_str_cat() function instead, which is also heavily
used. Assuming that strlen() isn't going to SEGV on you (which depends
on dumb luck) this becomes just a question of efficiency.
_gnutls_str_cat is used in 61 places in 80kloc (v2.3.2), and it does
protect against overflows.  Looking over the uses, many of them are to
construct the ASN.1 field names, which are fixed strings stored in local
fixed buffers, rather than working on variable-size inputs.  The code
looks ugly to me, but seems correct in the few places I looked carefully
at.
...
E.g. in x509/dn.c you have in str_escape()
   str_length = MIN (strlen (str), buffer_size - 1);
which is already bad because the MIN macro may evaluate that strlen() call twice.
This function is called from _gnutls_x509_parse_dn() which makes heavy
use of _gnutls_string_append_str() which guess what, calls strlen() on
its arguments again. All of your string processing code has O(n^2)
efficiency. It's incredibly sloppy and there's no justifiable reason
for it.
In x509/output.c your gnutls_x509_crt_print() function constructs a
gnutls_string structure with a series of inefficient calls and then,
the final capper:
      _gnutls_string_append_data (&str, "\0", 1);
      out->data = str.data;
      out->size = strlen (str.data);
It walks all across the length of the string to append a single
terminating NUL byte, and then it calls strlen() on the result
*again*, even though str.length is already calculated for you in
_gnutls_string_append_data().
If you actually had used gnutls_string structures everywhere, you
*could* do all string construction operations in O(n) time (n ~ number
of characters being added) by simply memcpy'ing to
str.data+str.length. As it is, anything based on strcat is inherently
O(n^2) and you do so many repeated traversals of the same data that I
suspect the actual compute cost is even higher than n^2.
Yeah, there are large parts that could be optimized, and you have found
good examples.
I think the X.509 code in GnuTLS is of less quality than the rest, and
it would be interesting to see if we can switch to using something like
libksba.  The goal with GnuTLS is to be a SSL/TLS implementation, and if
there had been good X.509 libraries around I suspect we wouldn't have
incrementally added code to handle X.509 at all.  I wasn't involved in
the project at the time the X.509 code was written so I don't know what
the thoughts were, though.
Still, the X.509 part of a TLS session is small.  We take care to do the
expensive processing before the actual TLS session start.  (Such as
computing the RDN_SEQ to send to clients.)  My hope is that GnuTLS will
do minimal X.509 handling in the future.  We support PKCS#11-like
handling of private keys, and applications can off-load the X.509 chain
verification to other libraries as well.
In the end, this is about economics and trade-offs.  While the code is
technically sometimes both inefficient and inelegant, there are too few
people who work on it to make re-writing code a good use of our time.
If GnuTLS was a larger and funded project like OpenSSL, NSS, or
OpenLDAP, things may be different.
/Simon

2025

2024

2023

2022

2021

2020

2019

2018

2017

2016

2015

2014

2013

2012

2011

2010

2009

2008

2007

2006

Re: GnuTLS considered harmful