First, sorry for having placed this thread in private, that was unintentional (maybe I should reconsider using a "reply to all" by default). Group added.
2011/7/29 Howard Chu hyc@symas.com:
Erwann ABALEA wrote:
[...]
In fact, I know such a CA that was generated some months ago, with a very large audience, and whose certificates are to be stored in an LDAP structure. Czech Republic passport CA certificate. If you want to know how it's used, and by who, we can talk about it in private, or you can look at www.icao.int, search for Doc9303 documents, PKD structure, etc. (In fact, I didn't know of this limitation, and I'll look forward its impact in integrating the Czech certificates in the OpenLDAP structures we sell and deploy). I agree that UTF8String would have been a much better choice, but X.509 doesn't prevent the use of T61String. In the meantime, some products still don't support UTF8String in certificates, a Novell proxy product (I don't remember its exact name) is an example I encountered recently.
If that's the case, what solutions do you propose? We could accept T61String if it only uses characters that are present in 7-bit ASCII of course. But once you venture into 8-bit and extended/accented characters all bets are off.
I'll need to grab this CA certificate back. I was asked to give my opinion on whether it was to be considered valid or not. Despite the fact that T61String is clearly deprecated in RFC5280/3280/2459, and that ICAO has chosen to base their certificate profile on RFC3280 (a bad choice), asking for a country to change its root CA cert signing all its passports because that doesn't follow rules I personally don't adhere to is difficult and counterproductive. I wasn't the only one to have this idea, and it was accepted. I'm 99% sure 7 bits didn't suffice. The remaining 1% will be fixed as soon as I find the certificate.
Do you have any document or pointer to understand the task of converting to/from T.61, and incompatible character sets you talked about? I Googled for this, but I'm not sure of what I found (what I found reminds me of old character sets we used many years ago in France for the Minitel, with G1/G2 character groups, etc, not that far from VT consoles).
The ICAO group or PKD Board could ask the Czechs to produce X.509 certificates with UTF8String encoding for the issuerName's fields, it's perfectly valid by X.520 rules (as long as the content is semantically identical, differently encoded strings are equal, as you know), but that's asking for implementors to produce completely compliant code. And such code will have to be clearly written by dozens of staffs in the world, and used by autonomous devices reading passports, etc. A big bet. Given what I see in different countries' certificates, we're far from this.
I'll try to take a look tomorrow.
Erwann ABALEA wrote:
First, sorry for having placed this thread in private, that was unintentional (maybe I should reconsider using a "reply to all" by default). Group added.
2011/7/29 Howard Chuhyc@symas.com:
Erwann ABALEA wrote:
[...]
In fact, I know such a CA that was generated some months ago, with a very large audience, and whose certificates are to be stored in an LDAP structure. Czech Republic passport CA certificate. If you want to know how it's used, and by who, we can talk about it in private, or you can look at www.icao.int, search for Doc9303 documents, PKD structure, etc. (In fact, I didn't know of this limitation, and I'll look forward its impact in integrating the Czech certificates in the OpenLDAP structures we sell and deploy). I agree that UTF8String would have been a much better choice, but X.509 doesn't prevent the use of T61String. In the meantime, some products still don't support UTF8String in certificates, a Novell proxy product (I don't remember its exact name) is an example I encountered recently.
If that's the case, what solutions do you propose? We could accept T61String if it only uses characters that are present in 7-bit ASCII of course. But once you venture into 8-bit and extended/accented characters all bets are off.
I'll need to grab this CA certificate back. I was asked to give my opinion on whether it was to be considered valid or not. Despite the fact that T61String is clearly deprecated in RFC5280/3280/2459, and that ICAO has chosen to base their certificate profile on RFC3280 (a bad choice), asking for a country to change its root CA cert signing all its passports because that doesn't follow rules I personally don't adhere to is difficult and counterproductive. I wasn't the only one to have this idea, and it was accepted. I'm 99% sure 7 bits didn't suffice. The remaining 1% will be fixed as soon as I find the certificate.
Do you have any document or pointer to understand the task of converting to/from T.61, and incompatible character sets you talked about? I Googled for this, but I'm not sure of what I found (what I found reminds me of old character sets we used many years ago in France for the Minitel, with G1/G2 character groups, etc, not that far from VT consoles).
You can reference this old draft; I wrote Appendix A and B to document the mapping as we understood it at that time. These Appendices were dropped from the final version because it was considered futile to attempt to document the T.61 character encoding rules.
http://tools.ietf.org/html/draft-ietf-ldapbis-strprep-00#appendix-A
You can also read libldap/t61.c; the code has been present in every OpenLDAP release since 2002 but is not compiled or used.
Howard Chu wrote:
Erwann ABALEA wrote:
Do you have any document or pointer to understand the task of converting to/from T.61, and incompatible character sets you talked about? I Googled for this, but I'm not sure of what I found (what I found reminds me of old character sets we used many years ago in France for the Minitel, with G1/G2 character groups, etc, not that far from VT consoles).
You can reference this old draft; I wrote Appendix A and B to document the mapping as we understood it at that time. These Appendices were dropped from the final version because it was considered futile to attempt to document the T.61 character encoding rules.
http://tools.ietf.org/html/draft-ietf-ldapbis-strprep-00#appendix-A
You can also read libldap/t61.c; the code has been present in every OpenLDAP release since 2002 but is not compiled or used.
This Guide has a pretty good discussion of the issues.
http://www.cs.auckland.ac.nz/~pgut001/pubs/x509guide.txt
The section on "Character Sets" is particularly relevant. The section on "Comparing DNs" is somewhat relevant, though in fact OpenLDAP has already solved this problem (for all the string types besides T61String) by doing all matching in UTF-8.
2011/7/29 Howard Chu hyc@symas.com:
Howard Chu wrote:
Erwann ABALEA wrote:
Do you have any document or pointer to understand the task of converting to/from T.61, and incompatible character sets you talked about? I Googled for this, but I'm not sure of what I found (what I found reminds me of old character sets we used many years ago in France for the Minitel, with G1/G2 character groups, etc, not that far from VT consoles).
You can reference this old draft; I wrote Appendix A and B to document the mapping as we understood it at that time. These Appendices were dropped from the final version because it was considered futile to attempt to document the T.61 character encoding rules.
http://tools.ietf.org/html/draft-ietf-ldapbis-strprep-00#appendix-A
You can also read libldap/t61.c; the code has been present in every OpenLDAP release since 2002 but is not compiled or used.
This Guide has a pretty good discussion of the issues.
http://www.cs.auckland.ac.nz/~pgut001/pubs/x509guide.txt
The section on "Character Sets" is particularly relevant. The section on "Comparing DNs" is somewhat relevant, though in fact OpenLDAP has already solved this problem (for all the string types besides T61String) by doing all matching in UTF-8.
Thank you for the pointers. I appreciate Peter's writings, and already read this text, some time ago, but wasn't focused on T.61 then. OpenSSL in its 1.0.0 version internally stores the named in UTF8, "semi-normalized" form (useless spaces removed, everything is converted to lowercase, but no NFC/NFD normalization is done).
I'm reading now libldap/t61.c. I just read the IETF draft, and the numerous tables... What a mess. X.680 has a reference to T.61 recommendation, which was deleted some years ago, and I'm not clever enough to make Google find a copy of the standard. It can't be bought anymore from ITU, but it's still referenced by later standards. Nice.
Meanwhile, I still haven't found the Czech CSCA certificate, but I know what to do with the remaining 1% uncertainty. The CN field is encoded as T61String, to hold the "CSCA_CZ" value. That fits well within the 7bits limit.
If everything is internally converted to UTF8 and t61.c seems to provide a lossless T.61 to UTF8 conversion, why isn't it used?
Erwann ABALEA wrote:
2011/7/29 Howard Chuhyc@symas.com:
Howard Chu wrote:
Erwann ABALEA wrote:
Do you have any document or pointer to understand the task of converting to/from T.61, and incompatible character sets you talked about? I Googled for this, but I'm not sure of what I found (what I found reminds me of old character sets we used many years ago in France for the Minitel, with G1/G2 character groups, etc, not that far from VT consoles).
You can reference this old draft; I wrote Appendix A and B to document the mapping as we understood it at that time. These Appendices were dropped from the final version because it was considered futile to attempt to document the T.61 character encoding rules.
http://tools.ietf.org/html/draft-ietf-ldapbis-strprep-00#appendix-A
You can also read libldap/t61.c; the code has been present in every OpenLDAP release since 2002 but is not compiled or used.
This Guide has a pretty good discussion of the issues.
http://www.cs.auckland.ac.nz/~pgut001/pubs/x509guide.txt
The section on "Character Sets" is particularly relevant. The section on "Comparing DNs" is somewhat relevant, though in fact OpenLDAP has already solved this problem (for all the string types besides T61String) by doing all matching in UTF-8.
Thank you for the pointers. I appreciate Peter's writings, and already read this text, some time ago, but wasn't focused on T.61 then. OpenSSL in its 1.0.0 version internally stores the named in UTF8, "semi-normalized" form (useless spaces removed, everything is converted to lowercase, but no NFC/NFD normalization is done).
I'm reading now libldap/t61.c. I just read the IETF draft, and the numerous tables... What a mess. X.680 has a reference to T.61 recommendation, which was deleted some years ago, and I'm not clever enough to make Google find a copy of the standard. It can't be bought anymore from ITU, but it's still referenced by later standards. Nice.
The 1988 edition is still downloadable. http://www.itu.int/rec/T-REC-T.61-198811-S/en
It also references T.51: http://www.itu.int/rec/T-REC-T.51/en
Unfortunately the 1993 edition of T.61 is gone.
Meanwhile, I still haven't found the Czech CSCA certificate, but I know what to do with the remaining 1% uncertainty. The CN field is encoded as T61String, to hold the "CSCA_CZ" value. That fits well within the 7bits limit.
Then you should just be using PrintableString. You're required to use the least-inclusive string type, after all.
If everything is internally converted to UTF8 and t61.c seems to provide a lossless T.61 to UTF8 conversion, why isn't it used?
Because it's incomplete. It only handles the original 333 character repertoire of T.61, it doesn't handle shift-in/shift-out to other character sets. I believe in the last version of T.61 there was support for Japanese (JIS), Chinese, and Greek. So quite a lot more logic and tables needs to be added, and it looks like a lot of work for something nobody should actually be using.
2011/7/29 Howard Chu hyc@symas.com:
Erwann ABALEA wrote:
[...]
I'm reading now libldap/t61.c. I just read the IETF draft, and the numerous tables... What a mess. X.680 has a reference to T.61 recommendation, which was deleted some years ago, and I'm not clever enough to make Google find a copy of the standard. It can't be bought anymore from ITU, but it's still referenced by later standards. Nice.
The 1988 edition is still downloadable. http://www.itu.int/rec/T-REC-T.61-198811-S/en
One click further... Sorry, I read "Superseded" for the 1988 edition, and "Withdrawn" for the 1993 one, and didn't realize I could click on the 1988 link to download this edition.
It also references T.51: http://www.itu.int/rec/T-REC-T.51/en
Unfortunately the 1993 edition of T.61 is gone.
Even more, it is displayed as "Never published."
Meanwhile, I still haven't found the Czech CSCA certificate, but I know what to do with the remaining 1% uncertainty. The CN field is encoded as T61String, to hold the "CSCA_CZ" value. That fits well within the 7bits limit.
Then you should just be using PrintableString. You're required to use the least-inclusive string type, after all.
Howard, you're very good. But you too can make mistakes :) 1. PrintableString is inadequate (because of the underscore character). 2. It's not *my* certificate, it's the Czech Republic certificate used to issue passports that will be verified by every other country. I don't work for them, I'm french :) (OK, I don't know where you live either) 3. And the "you're required to use the least-inclusive string type" statement is wrong, there's no such requirement, nowhere, everybody's free to use UTF8String even when PrintableString could be used (it's even RECOMMENDED to do so). The good option would have been to use UTF8String, for sure. But they already produced passports, and changing the root certificate of this importance is not an easy task.
Instead, RFC4518 states that non-Unicode strings are to be translated to Unicode, even TeletexString. The reader is warned that it is NOT RECOMMENDED to use such encoding, and that no standard conversion rules between them exists, but this conversion is *not* optional.
If everything is internally converted to UTF8 and t61.c seems to provide a lossless T.61 to UTF8 conversion, why isn't it used?
Because it's incomplete. It only handles the original 333 character repertoire of T.61, it doesn't handle shift-in/shift-out to other character sets. I believe in the last version of T.61 there was support for Japanese (JIS), Chinese, and Greek. So quite a lot more logic and tables needs to be added, and it looks like a lot of work for something nobody should actually be using.
If the support for JIS, Chinese, and Greek characters were to be included in the 1993 edition, and this edition has never been published, couldn't it be possible to ignore them? X.680 (1997 edition) also references the 1988 edition of T.61, and if no newer edition is present, then it still must be used, right?
Is an incomplete (but documented) support for T61String really worse than no support for it at all? Even if literature tells that no perfect support can exist?
Erwann ABALEA wrote:
2011/7/29 Howard Chuhyc@symas.com:
Then you should just be using PrintableString. You're required to use the least-inclusive string type, after all.
Howard, you're very good. But you too can make mistakes :)
- PrintableString is inadequate (because of the underscore character).
Ah, right.
If everything is internally converted to UTF8 and t61.c seems to provide a lossless T.61 to UTF8 conversion, why isn't it used?
Because it's incomplete. It only handles the original 333 character repertoire of T.61, it doesn't handle shift-in/shift-out to other character sets. I believe in the last version of T.61 there was support for Japanese (JIS), Chinese, and Greek. So quite a lot more logic and tables needs to be added, and it looks like a lot of work for something nobody should actually be using.
If the support for JIS, Chinese, and Greek characters were to be included in the 1993 edition, and this edition has never been published, couldn't it be possible to ignore them? X.680 (1997 edition) also references the 1988 edition of T.61, and if no newer edition is present, then it still must be used, right?
Actually Japanese and Chinese are already specified in the 1988 edition. Greek is mentioned there but is lacking a specification of which escape code to use. Probably it's defined elsewhere, like a later version of ISO 2022 or somesuch.
Is an incomplete (but documented) support for T61String really worse than no support for it at all? Even if literature tells that no perfect support can exist?
In the security arena, yes. E.g. if we accept a T61String that uses escape codes, we will not normalize it correctly to UTF-8. From there we would be giving "definitive" yes/no results to matches, based on invalid comparisons.
Could we accept some safe subset of T.61 and reject the rest? As long as we don't need to translate back...
Hallvard B Furuseth wrote:
Could we accept some safe subset of T.61 and reject the rest? As long as we don't need to translate back...
Perhaps. The original post in this thread was complaining about a plain attribute value as well as a certificate DN. Obviously LDAPv3 requires strings to be provided in UTF-8; one has to wonder if the client was performing an LDAPv2 Bind. If we tie string normalization behavior to the session protocol version, then that means we would also need to be able translate back from UTF-8 to T.61.
Clearly we are not going to add any support for LDAPv2 at this late date.
At this point I think all the facts and resources have been laid out. Patches welcome, if anyone wants to pursue it further.
2011/7/29 Howard Chu hyc@symas.com:
Erwann ABALEA wrote:
If the support for JIS, Chinese, and Greek characters were to be included in the 1993 edition, and this edition has never been published, couldn't it be possible to ignore them? X.680 (1997 edition) also references the 1988 edition of T.61, and if no newer edition is present, then it still must be used, right?
Actually Japanese and Chinese are already specified in the 1988 edition. Greek is mentioned there but is lacking a specification of which escape code to use. Probably it's defined elsewhere, like a later version of ISO 2022 or somesuch.
I agree it's very badly "normalized", to say the least. Another spaghetti plate to look through...
Is an incomplete (but documented) support for T61String really worse than no support for it at all? Even if literature tells that no perfect support can exist?
In the security arena, yes. E.g. if we accept a T61String that uses escape codes, we will not normalize it correctly to UTF-8. From there we would be giving "definitive" yes/no results to matches, based on invalid comparisons.
The security argument is good. For my personal use, certificateMatch filter is not used. But I'll need to store X.509 certificates, some containing T61String elements in issuerDN, and retrieve them using more classic search filters &((objectClass=inetOrgPerson)(cn=...)(sn=...)) and get the userCertificate;binary attribute. I found some messages from 2006 telling that certificateMatch were done using OpenSSL. Did you chose to code it differently to support other crypto libraries, such as GnuTLS?
Be compliant with all the standards is difficult, I know, especially when they're incomplete and/or divergent. I have no other proposition than letting the admin chose between "I want to be able to use certificateMatch" and "I want to be able to store any stupid certificate in my tree".
Erwann ABALEA wrote:
2011/7/29 Howard Chuhyc@symas.com: The security argument is good. For my personal use, certificateMatch filter is not used. But I'll need to store X.509 certificates, some containing T61String elements in issuerDN, and retrieve them using more classic search filters &((objectClass=inetOrgPerson)(cn=...)(sn=...)) and get the userCertificate;binary attribute. I found some messages from 2006 telling that certificateMatch were done using OpenSSL. Did you chose to code it differently to support other crypto libraries, such as GnuTLS?
Yes. Once we made the decision to support multiple TLS libraries we obviously needed to refactor, particularly since libraries like GnuTLS were completely broken in their processing of certificate names.
openldap-technical@openldap.org