Hello,
This is probably more a ldif than an OpenLDAP question, but still, maybe somebody knows the answer: Is there a way to put multibyte characters into an attribute value and let the server know, these are not to be treated literally, but are utf8 character encodings?
I've tried to dig into rfc3629 and 4517, but those were above my capabilities.
It does of course work for the dn, it also works, if I provide base64 code to the attributes, but is there a way, to directly put them into a ldif an let the server know, these are character encodings? Also, rfc2849 only talks about not line breaking multi-byte characters.
In this silly, but easy, example, both cn: and description: are entered literally, while the dn words as intended:
dn: cn=A \F0\9F\99\82 Test,dc=example,dc=com cn: A \F0\9F\99\82 Test objectClass: person sn: Test description: %xF0%x9F%x99%x82 Test
This is about understanding, not about the intention, to really put a smily into a dn. I am aware, this a potential recipe for disaster.
Also, I am aware, the OpenLDAP kindly adds a proper cn value anyway, but that does not help here. And still would leave the description open.
Also, as mentioned before: echo -en "A \xF0\x9F\x99\x82 Test" | base64 is a viable workaround, but a cumbersome one.
So maybe there is an easier way
Thanks
Ede
Ede Wolf wrote:
Hello,
This is probably more a ldif than an OpenLDAP question, but still, maybe somebody knows the answer: Is there a way to put multibyte characters into an attribute value and let the server know, these are not to be treated literally, but are utf8 character encodings?
Strings in LDAPv3 are all UTF-8, by definition. This is in RFC4511 section 4.1.2.
I've tried to dig into rfc3629 and 4517, but those were above my capabilities.
It does of course work for the dn, it also works, if I provide base64 code to the attributes, but is there a way, to directly put them into a ldif an let the server know, these are character encodings? Also, rfc2849 only talks about not line breaking multi-byte characters.
base64 encoding for LDIF values is mostly optional. As long as the string you're entering doesn't have embedded NUL or CR/LF characters, you don't need to use base64.
In this silly, but easy, example, both cn: and description: are entered literally, while the dn words as intended:
dn: cn=A \F0\9F\99\82 Test,dc=example,dc=com cn: A \F0\9F\99\82 Test objectClass: person sn: Test description: %xF0%x9F%x99%x82 Test
This is about understanding, not about the intention, to really put a smily into a dn. I am aware, this a potential recipe for disaster.
Also, I am aware, the OpenLDAP kindly adds a proper cn value anyway, but that does not help here. And still would leave the description open.
Also, as mentioned before: echo -en "A \xF0\x9F\x99\x82 Test" | base64 is a viable workaround, but a cumbersome one.
So maybe there is an easier way
Thanks
Ede
Am 20.02.23 um 18:06 schrieb Howard Chu:
Ede Wolf wrote:
Hello,
This is probably more a ldif than an OpenLDAP question, but still, maybe somebody knows the answer: Is there a way to put multibyte characters into an attribute value and let the server know, these are not to be treated literally, but are utf8 character encodings?
Strings in LDAPv3 are all UTF-8, by definition. This is in RFC4511 section 4.1.2.
...
base64 encoding for LDIF values is mostly optional. As long as the string you're entering doesn't have embedded NUL or CR/LF characters, you don't need to use base64.
Thanks very much. Please, but how can I provide a non ascii character, that is not on my keyboard, for which I only have the code point or the hex values. Like I can do within the dn
Ede Wolf wrote:
Am 20.02.23 um 18:06 schrieb Howard Chu:
Ede Wolf wrote:
Hello,
This is probably more a ldif than an OpenLDAP question, but still, maybe somebody knows the answer: Is there a way to put multibyte characters into an attribute value and let the server know, these are not to be treated literally, but are utf8 character encodings?
Strings in LDAPv3 are all UTF-8, by definition. This is in RFC4511 section 4.1.2.
...
base64 encoding for LDIF values is mostly optional. As long as the string you're entering doesn't have embedded NUL or CR/LF characters, you don't need to use base64.
Thanks very much. Please, but how can I provide a non ascii character, that is not on my keyboard, for which I only have the code point or the hex values. Like I can do within the dn
The same way you would enter Unicode in any other application. This is not an LDAP- or LDIF-specific question.
1) use a terminal and locale that support UTF-8. 2) use whatever tools your OS provides for entering Unicode characters. Probably something named "Unicode character map" or similar.
The same way you would enter Unicode in any other application. This is not an LDAP- or LDIF-specific question.
- use a terminal and locale that support UTF-8.
- use whatever tools your OS provides for entering Unicode characters. Probably something named "Unicode character map" or similar.
Thanks again. But my question regards the values for attributes.
Having a ldif file, for the dn I can enter: dn: cn=A \F0\9F\99\82 Test,dc=example,dc=com
That would literally give me the utf8 smiley icon as part of my dn - provided my font feratures that, of course. So I can use the hex encoding representation to enter any UTF-8 character.
I can even search for that icon, using that hex encoding as search base or part of the search filter.
However, for a value, I cannot do this, and my question is, is there a way at all? This has nothing to do with my console.
For a directorystring attribute (it value), is there any way of entering code points straight into my ldif - be it U+0000 or hex notation - and having the server interpret them, as it works for the dn?
Not copy+paste from the command line, but, again, as encodigs where the ldap server knows, these are to be interpreted. As it does for the dn.
Something like: cn: A \F0\9F\99\82 Test
Just with a syntax that works. If that it possible at all.
Thanks
Ede
On Tue, Feb 21, 2023 at 04:10:53PM +0100, Ede Wolf wrote:
The same way you would enter Unicode in any other application. This is not an LDAP- or LDIF-specific question.
- use a terminal and locale that support UTF-8.
- use whatever tools your OS provides for entering Unicode
characters. Probably something named "Unicode character map" or similar.
Thanks again. But my question regards the values for attributes.
Having a ldif file, for the dn I can enter: dn: cn=A \F0\9F\99\82 Test,dc=example,dc=com
That would literally give me the utf8 smiley icon as part of my dn - provided my font feratures that, of course. So I can use the hex encoding representation to enter any UTF-8 character.
I can even search for that icon, using that hex encoding as search base or part of the search filter.
However, for a value, I cannot do this, and my question is, is there a way at all? This has nothing to do with my console.
For a directorystring attribute (it value), is there any way of entering code points straight into my ldif - be it U+0000 or hex notation - and having the server interpret them, as it works for the dn?
Not copy+paste from the command line, but, again, as encodigs where the ldap server knows, these are to be interpreted. As it does for the dn.
Something like: cn: A \F0\9F\99\82 Test
Just with a syntax that works. If that it possible at all.
Hi Ede, in a search filter you are dealing with RFC 4515 which describes how to escape values into that string, similar with URIs, which have their own escaping rules you can leverage for this. With LDIF, you are dealing with RFC 2849 which has no concept of escaping values: you can enter a value as-is, base64 encoded or pass it from a separate file/URI but that's it.
The client doesn't do any processing of the ldif data (unlike URLs, or filters, as discussed above) and will pass it on as it sees it, so it will send the text 'cn=A \F0\9F\99\82 Test,dc=example,dc=com' as DN. It is up to the server (and the attribute's syntax) to decide what happens with whatever you've input. In the case of DNs, sending 'cn=A \F0\9F\99\82 Test,dc=example,dc=com' is then (server-side) considered equivalent to you specifying 'cn=a 🙂 test , dc=EXAMPLE, dc=cOm' as per RFC 4514.
Regards,
Hi Ede, in a search filter you are dealing with RFC 4515 which describes how to escape values into that string, similar with URIs, which have their own escaping rules you can leverage for this. With LDIF, you are dealing with RFC 2849 which has no concept of escaping values: you can enter a value as-is, base64 encoded or pass it from a separate file/URI but that's it.
Hello Ondřej,
Thanks very much! That is exactly the anwer I have been looking for - even more so in fact, as additionally I have been educated, that the dn is a different part from the rest of the ldif. Makes sense, but I've never really thought of this before.
Even though, of course, it is not what I have been hoping to hear.
Thanks again to you and all who took their time to help!
Ede
On Tue, Feb 21, 2023 at 05:32:01PM +0100, Ede Wolf wrote:
Hello Ondřej,
Thanks very much! That is exactly the anwer I have been looking for - even more so in fact, as additionally I have been educated, that the dn is a different part from the rest of the ldif. Makes sense, but I've never really thought of this before.
Correction: the dn is the same as anything else in the LDIF, it is the server that then does extra processing on the text it receives because its syntax is a DN, just like a 'member' attribute, 'seeAlso' and others like them will allow you to do the same - they are of the distinguished name syntax. Or like with the 'cn' attribute you can write 'TEST' or 'tEst' and they will be considered equivalent, because its syntax says so, not because the client did anything.
Regards,
Ondřej Kuzník wrote:
On Tue, Feb 21, 2023 at 05:32:01PM +0100, Ede Wolf wrote:
Hello Ondřej,
Thanks very much! That is exactly the anwer I have been looking for - even more so in fact, as additionally I have been educated, that the dn is a different part from the rest of the ldif. Makes sense, but I've never really thought of this before.
Correction: the dn is the same as anything else in the LDIF, it is the server that then does extra processing on the text it receives because its syntax is a DN, just like a 'member' attribute, 'seeAlso' and others like them will allow you to do the same - they are of the distinguished name syntax. Or like with the 'cn' attribute you can write 'TEST' or 'tEst' and they will be considered equivalent, because its syntax says so, not because the client did anything.
Regards,
For example:
viola:~/OD/hobj/tests> cat emoji.ldif dn: cn=😀 face,ou=people,dc=example,dc=com objectclass: person cn: 😀 face sn: face
viola:~/OD/hobj/tests> ../clients/tools/ldapmodify -x -a -H ldap://:9011 -D cn=manager,dc=example,dc=com -w secret -f emoji.ldif adding new entry "cn=😀 face,ou=people,dc=example,dc=com"
viola:~/OD/hobj/tests> ../clients/tools/ldapse ldapsearch* ldapsearch.sleep* viola:~/OD/hobj/tests> ../clients/tools/ldapsearch -x -H ldap://:9011 -b ou=people,dc=example,dc=com '(cn=😀 face)' # extended LDIF # # LDAPv3 # base <ou=people,dc=example,dc=com> with scope subtree # filter: (cn=😀 face) # requesting: ALL #
# \F0\9F\98\80 face, People, example.com dn:: Y2498J+YgCBmYWNlLG91PVBlb3BsZSxkYz1leGFtcGxlLGRjPWNvbQ== objectClass: person cn:: 8J+YgCBmYWNl sn: face
# search result search: 2 result: 0 Success
# numResponses: 2 # numEntries: 1
Ede Wolf listac@nebelschwaden.de schrieb am 21.02.2023 um 16:10 in Nachricht
5fed02ec-1e12-5264-305f-a3f69a335480@nebelschwaden.de:
The same way you would enter Unicode in any other application. This is not
an LDAP- or LDIF-specific question.
- use a terminal and locale that support UTF-8.
- use whatever tools your OS provides for entering Unicode characters.
Probably something named "Unicode character map" or similar.
Thanks again. But my question regards the values for attributes.
Having a ldif file, for the dn I can enter: dn: cn=A \F0\9F\99\82 Test,dc=example,dc=com
It seems the backslash notation is not actually defined for LDIF.
RFC 2849 (LDAP Data Interchange Format) says:
SAFE-STRING = [SAFE-INIT-CHAR *SAFE-CHAR]
SAFE-CHAR = %x01-09 / %x0B-0C / %x0E-7F SAFE-INIT-CHAR = %x01-09 / %x0B-0C / %x0E-1F / %x21-39 / %x3B / %x3D-7F
dn-spec = "dn:" (FILL distinguishedName / ":" FILL base64-distinguishedName) distinguishedName = SAFE-STRING rdn = SAFE-STRING
base64-distinguishedName = BASE64-UTF8-STRING
base64-rdn = BASE64-UTF8-STRING
UTF8-STRING = *UTF8-CHAR
BASE64-CHAR = %x2B / %x2F / %x30-39 / %x3D / %x41-5A / %x61-7A
BASE64-UTF8-STRING = BASE64-STRING
BASE64-STRING = [*(BASE64-CHAR)]
That would literally give me the utf8 smiley icon as part of my dn - provided my font feratures that, of course. So I can use the hex encoding representation to enter any UTF-8 character.
I can even search for that icon, using that hex encoding as search base or part of the search filter.
However, for a value, I cannot do this, and my question is, is there a way at all? This has nothing to do with my console.
For a directorystring attribute (it value), is there any way of entering code points straight into my ldif - be it U+0000 or hex notation - and having the server interpret them, as it works for the dn?
Not copy+paste from the command line, but, again, as encodigs where the ldap server knows, these are to be interpreted. As it does for the dn.
Something like: cn: A \F0\9F\99\82 Test
Just with a syntax that works. If that it possible at all.
Thanks
Ede
It seems the backslash notation is not actually defined for LDIF.
That indeed is a valuable hint, out of curiosity I will test, wether other ldap server implementations will also accept at least the \FF notation for the dn, but that is off topic here.
RFC 2849 (LDAP Data Interchange Format) says:
SAFE-STRING = [SAFE-INIT-CHAR *SAFE-CHAR]
SAFE-CHAR = %x01-09 / %x0B-0C / %x0E-7F SAFE-INIT-CHAR = %x01-09 / %x0B-0C / %x0E-1F / %x21-39 / %x3B / %x3D-7F
I have come across this way of writing in a couple of rfc, also one about unicode, but the %xFF notation (with or without the x) never worked for me. Not even within the dn.
dn: cn=A %xF0%x9F%x99%x82 Test,dc=example,dc=com
Does not work as intended, unless I've made another mistake.
But as said before, I am a bit overwhelmed with understanding these rfc. Or rather translate them into practical action.
So far I had only luck with the \FF notation, and only for the dn, which is correct, as I know now.
But, as mentioned above, I will also test against SDS and 389DS, to figure, wether those will also accept the backslash notation as well.
Just to see, if this is kind of a defacto standard, even if not hardcoded into a rfc.
Ede Wolf listac@nebelschwaden.de schrieb am 22.02.2023 um 11:03 in Nachricht
6ad6c4b7-3f7d-3e2b-b1bd-936bd6060af3@nebelschwaden.de:
It seems the backslash notation is not actually defined for LDIF.
That indeed is a valuable hint, out of curiosity I will test, wether other ldap server implementations will also accept at least the \FF notation for the dn, but that is off topic here.
RFC 2849 (LDAP Data Interchange Format) says:
SAFE-STRING = [SAFE-INIT-CHAR *SAFE-CHAR]
SAFE-CHAR = %x01-09 / %x0B-0C / %x0E-7F SAFE-INIT-CHAR = %x01-09 / %x0B-0C / %x0E-1F / %x21-39 / %x3B / %x3D-7F
I have come across this way of writing in a couple of rfc, also one about unicode, but the %xFF notation (with or without the x) never worked for me. Not even within the dn.
You mix notations: %x09 means a literal TAB character for the actual syntax, but it is described as %x09 in the language describing the syntax for example. The syntax notation is "Crocker, D., and P. Overell, "Augmented BNF for Syntax Specifications: ABNF", RFC 2234, November 1997."
Regards, Ulrich
dn: cn=A %xF0%x9F%x99%x82 Test,dc=example,dc=com
Does not work as intended, unless I've made another mistake.
But as said before, I am a bit overwhelmed with understanding these rfc. Or rather translate them into practical action.
So far I had only luck with the \FF notation, and only for the dn, which is correct, as I know now.
But, as mentioned above, I will also test against SDS and 389DS, to figure, wether those will also accept the backslash notation as well.
Just to see, if this is kind of a defacto standard, even if not hardcoded into a rfc.
openldap-technical@openldap.org