h.b.furuseth@usit.uio.no wrote:
Full_Name: Hallvard B Furuseth Version: HEAD, RE23, RE24 OS: Linux URL: http://folk.uio.no/hbf/OpenLDAP/back-ldif.c Submission from: (NULL) (129.240.6.233) Submitted by: hallvard
Here are a bunch of back-ldif bugs some questions. I have code for most of it, but need advice/discussion on functionality changes.
I've been sitting on it too long and would keep sitting if I waited until I'd cleaned up everything, so, posting now instead. I enclose an URL for a *draft* back-ldif/ldif.c.
Functionality changes.
Some LDIF files/directories with special chars need to be renamed:
RE24 or RE25 change? If anyone uses back-ldif as a regular database, they may need to slapcat/slapadd to upgrade past these changes.
back-ldif escapes the directory separator as <hex value>. But Windows uses "" as directory separator, so that doesn't help. It needs another escape character.
To avoid double hex-escaping of DNs that contain "", we can pick another escape char, e.g. "^", escape that too, and translate "" to it. Thus DN "cn=x^y\2Cz" gets filename "cn=x^5Ey^2Cz.ldif".
That sounds fine. Is there any reason not to use %, which is already used for URL encoding?
More characters should likely be hex-escaped:
- ":" (as in "C:") and "/" on Windows? I've seen programs use the latter as directory separator even on Windows. Others?
- 8-bit chars in case the OS gets clever about charset handling.
- Control chars.
I don't use Windows myself though. Nor Mac...
Again, the rules for URL encoding probably already cover any cases we should worry about.
When back-ldif uses OS-specific escaping, we can't move a directory tree between Windows and Unix hosts if some RDN in the tree contains characters with OS-specific escaping.
Should we special-case both Unix' and Windows' directory separators on both OSes? Then instead there will be more directory trees which can't be move between OpenLDAP versions, before and after the change. Assuming anyone uses back-ldif in the first place:-)
Might as well break it all at once.
The database suffix must be hex-escaped like the rest of the filenames.
RDNs ending with ".ldif" should be escaped. Currently "cn=foo.ldif" is the name of both RDN "cn=foo"'s entry file and RDN "cn=foo.ldif"'s non-leaf directory.
Perhaps we should escape '.'
Sorted-value RDNs:
In dn2path() when IX_FSL != IX_DNL (filename '{' != RDN '{'): IX_FS[LR] must be hex-escaped. They are treated as equivalent to '{' - '}', thus different RDNs can map to the same filename.
The "{1}" in path "/foo/cn=x{,cn={1}y" is not recognized. The IX_DN[LR] (i.e. '{', '}') to IX_FS[LR] translation is a bit strange: It translates the first '{' it sees, then the next '}', then the next '{', etc. Any reason not to just translate all '{' and '}' chars? If so, we should at least reset at each filename and '+'.
Sorting (in ldif_r_enum_tree()):
Should "a={1}x" sort before or after "a=x"? Currently it sorts like "a={". "a=" (normally before) or "a=<CHAR_MAX>" (normally after) would be better.
RDN "attr=foo{bar}baz" is treated as a sorted value. Can easily check for '={'<successful strtol parse'> '}' instead. "attr={0<octal>}val" and "attr={0x<hex>}val" are recognized as sorted values, should they be?
Some of this is from ITS#4627 (back-ldif issues).