Hi all,
I hope that this will lead to some discussion on better utf-8 support.
It seems that ldapsearch does'n like dn's with utf-8 chars in it. see:
$ ldapsearch -xLLL "(uid=dwrc)" dn dn:: Y249RMW1ciBDeW1ydSxvdT11c2VyQWNjb3VudHMsZGM9Y3NpcnQsZGM9amFuZXQ=
$ locale LANG=en_GB.UTF-8 LC_CTYPE="en_GB.UTF-8" LC_NUMERIC="en_GB.UTF-8" LC_TIME="en_GB.UTF-8" LC_COLLATE="en_GB.UTF-8" LC_MONETARY="en_GB.UTF-8" LC_MESSAGES="en_GB.UTF-8" LC_PAPER="en_GB.UTF-8" LC_NAME="en_GB.UTF-8" LC_ADDRESS="en_GB.UTF-8" LC_TELEPHONE="en_GB.UTF-8" LC_MEASUREMENT="en_GB.UTF-8" LC_IDENTIFICATION="en_GB.UTF-8" LC_ALL=
add -u and doesn't get much better: ufn: D\C5\B5r Cymru, userAccounts, csirt.janet
While 'getent passwd' via libnss-ldap is more then happy with utf-8 usernames:
$ getent passwd dwrc:*:1111:100:Dŵr Cymru:/tmp:/bin/bash
(for those that have issues with utf-8 email, the 'w' in Dwr has a hat '^' on).
I am told on the irc channel that anything 'strange' is changed to base64. Some thoughts on this:
* I don't consider UTF-8 to be strange.
* If it really has to be this way then ldap* -D args (and others) should accept the base64 form.
* This behaviour should at least have an off switch, so the user can decide on the output form.
I was asked to post this issue to the list for discussion. My own position is that I have yet to hear any good reason why utf-8 support should not be a goal.
Regards,
Thorben
TJ wrote:
Hi all,
I hope that this will lead to some discussion on better utf-8 support.
Well, UTF8 is the only charset thats defined for LDAP, so I don't think that the support for UTF8 could be better.
It seems that ldapsearch does'nt like dn's with utf-8 chars in it. see:
[...]
Thats defined behaviour according to http://www.faqs.org/rfcs/rfc2849.html - maybe you want to comment on that.
- If it really has to be this way then ldap* -D args (and others) should
accept the base64 form.
Ok, this might be nifty for someone that hasn't got an UTF8-Terminal but has to search for UTF8-characters.
- This behaviour should at least have an off switch, so the user can decide
on the output form.
Since ldapsearch outputs "LDIF", it should stay true to the LDIF spec.
You can use Perl or any other Programming Language to build a tool that can output something other than LDIF.
bye Christian
TJ wrote:
Hi all,
I hope that this will lead to some discussion on better utf-8 support.
It seems that ldapsearch does'n like dn's with utf-8 chars in it. see:
$ ldapsearch -xLLL "(uid=dwrc)" dn dn:: Y249RMW1ciBDeW1ydSxvdT11c2VyQWNjb3VudHMsZGM9Y3NpcnQsZGM9amFuZXQ=
UTF is truly supported by OpenLDAP ever since. It is the client tool ldapsearch that, for portability reasons, prints values containing usually non-printable chars encoded in base64, as per LDIF (RFC 2849). So ldpasearch is fully RFC 2849 compliant since its purpose is to return valid LDIF data.
If you take your time and base64-decode that string you'll find your UTF8 value. If you don't like that client, use another, or provide a patch that, via a user switch, detects whether the terminal it's printing at supports UTF8 in a portable manner.
p.
Ing. Pierangelo Masarati OpenLDAP Core Team
SysNet s.r.l. via Dossi, 8 - 27100 Pavia - ITALIA http://www.sys-net.it --------------------------------------- Office: +39 02 23998309 Mobile: +39 333 4963172 Email: pierangelo.masarati@sys-net.it ---------------------------------------