Howard Chu writes:
Hallvard B Furuseth wrote:
Might as well hex-escape (...) (And '/', 8-bit and control chars, and a special hack for windows according to your latest message on the ITS.)
Honestly, I wouldn't make any special effort for control characters. All the filesystems we touch accept them.
Well - I think it'd be nice to be able to visit back-ldif filenames without troble. Still, it's just back-ldif, so as long as it works, it works:-)
8-bit I'm not so sure about, some filesystems expect UTF-8, while others are 8-bit clean to begin with. If we can expect that no one is going to use an octetString as a naming attribute, I think we'll be fine since everything else will be in UTF-8 anyway.
I've got three or so conflicting opinions on that myself. On an UTF-8 filesystem, that'll certainly be nice for people using non-Latin characters. Assuming they decide to use back-ldif in the first place, of course. I suppose it'll happen with private schema include files in cn=config, at least. And maybe module names.
So, unless someone else has strong opinions, I guess they'll stay untouched. And then there's EBCDIC, where uppercase A-Z chars are apparently 8-bit. Though an ('A' == 65) test could catch that.
<rant> I didn't believe "everything will soon be UTF-8" a decade ago and I still don't.
8-bit filenames and data worked just fine for a while until various applications discovered "internationalization". That can ruin an otherwise 8-bit clean system - the OS handles it fine, but not the apps. The Windows troubles in the URL you posted seem just the same.
For example, Emacs in its default mode on my system seems unable to visit 8-bit filenames from its own Dired buffer. Presumably something in it disagrees with itself about the encoding of filenames. I'm sure there is something I could configure - until I run into the next equally clever program. So personally I just stick to 7-bit filenames. </rant>