Erwann ABALEA wrote:
2011/7/29 Howard Chuhyc@symas.com:
Howard Chu wrote:
Erwann ABALEA wrote:
Do you have any document or pointer to understand the task of converting to/from T.61, and incompatible character sets you talked about? I Googled for this, but I'm not sure of what I found (what I found reminds me of old character sets we used many years ago in France for the Minitel, with G1/G2 character groups, etc, not that far from VT consoles).
You can reference this old draft; I wrote Appendix A and B to document the mapping as we understood it at that time. These Appendices were dropped from the final version because it was considered futile to attempt to document the T.61 character encoding rules.
http://tools.ietf.org/html/draft-ietf-ldapbis-strprep-00#appendix-A
You can also read libldap/t61.c; the code has been present in every OpenLDAP release since 2002 but is not compiled or used.
This Guide has a pretty good discussion of the issues.
http://www.cs.auckland.ac.nz/~pgut001/pubs/x509guide.txt
The section on "Character Sets" is particularly relevant. The section on "Comparing DNs" is somewhat relevant, though in fact OpenLDAP has already solved this problem (for all the string types besides T61String) by doing all matching in UTF-8.
Thank you for the pointers. I appreciate Peter's writings, and already read this text, some time ago, but wasn't focused on T.61 then. OpenSSL in its 1.0.0 version internally stores the named in UTF8, "semi-normalized" form (useless spaces removed, everything is converted to lowercase, but no NFC/NFD normalization is done).
I'm reading now libldap/t61.c. I just read the IETF draft, and the numerous tables... What a mess. X.680 has a reference to T.61 recommendation, which was deleted some years ago, and I'm not clever enough to make Google find a copy of the standard. It can't be bought anymore from ITU, but it's still referenced by later standards. Nice.
The 1988 edition is still downloadable. http://www.itu.int/rec/T-REC-T.61-198811-S/en
It also references T.51: http://www.itu.int/rec/T-REC-T.51/en
Unfortunately the 1993 edition of T.61 is gone.
Meanwhile, I still haven't found the Czech CSCA certificate, but I know what to do with the remaining 1% uncertainty. The CN field is encoded as T61String, to hold the "CSCA_CZ" value. That fits well within the 7bits limit.
Then you should just be using PrintableString. You're required to use the least-inclusive string type, after all.
If everything is internally converted to UTF8 and t61.c seems to provide a lossless T.61 to UTF8 conversion, why isn't it used?
Because it's incomplete. It only handles the original 333 character repertoire of T.61, it doesn't handle shift-in/shift-out to other character sets. I believe in the last version of T.61 there was support for Japanese (JIS), Chinese, and Greek. So quite a lot more logic and tables needs to be added, and it looks like a lot of work for something nobody should actually be using.