representing file pathnames - openldap-technical

9 Aug 2012


      Hi-
(Sorry for the length, it's a pretty wonky problem).
I'm working with the IETF NFSv4 working group on a schema for storing file system referral information in LDAP, as part of the FedFS standard (RFC 5716).  I'm looking for some opinions about certain details of the schema.
A "referral" is a file server response that conveys a table of file server hostname and export pathname pairs that are replicas of the local file set on that file server.  When a referral is encountered, a file system client chooses a row from this table and mounts that export automatically as it continues to traverse the file system name space.  Referrals are a standard part of both the NFSv4 and SMB protocols.
FedFS provides a standard way to store these rows in an LDAP database.  Each row is contained in a single LDAP record, called a File Set Location record.  A group of these records that live under the same parent record is retrieved by a file server to generate the table in a referral response.
Today an FSL record for an NFS referral contains among other things a UTF-8 string server hostname, an integer port number, and a binary blob containing an XDR-marshalled representation of the export pathname.  Note that both the pathname components and hostname are represented in UTF-8 in the NFS protocol, which is why they are stored as UTF-8 in LDAP.
XDR was chosen because the file server doesn't have to alter the pathname data it reads from the LDAP server; it can just turn it around and send it immediately on to NFS clients.  The pathname's components are UTF-8 strings.  The pathname is expressed as an ordered variable-length list of these strings.
The pathname separator is not stored in the XDR blob, since physical file systems can use different characters for this purpose (HFS+ on Mac OS uses ":", POSIX uses "/", and Windows uses "").  NFS typically performs single component lookups on the wire, so NFS clients are never concerned with how a file server might separate its pathname components.
The downside of using a binary XDR blob is that it's not observable or editable via typical LDAP tools.  Plus, ewww.
It's been suggested that we use a file URL to represent export pathnames.  A file URL is expressed in US-ASCII with escaping, and the pathname separator is stored in the string.  A file URL also has the ability to store a hostname.
"file://hostname/path/to/some/file"
I'm not sure this is the best fit for our purpose.  We're especially concerned about some of the complexities of converting escaped US-ASCII to UTF-8, and the use of a fixed pathname separator character.  Can we represent the full range of the UTF-8 code set with a US-ASCII file URL?
We could also use an NFS URL, which would allow us to express the server hostname, a port number, and the pathname in a single string.  But both the hostname and pathname are enocded in US-ASCII, not UTF-8, and the NFS URL format employs a fixed pathname separator character.
An alternative we have considered would store the pathname in a single-valued UTF-8 string attribute, including pathname separators, but also store the pathname separator character in a separate attribute.  A simple escaping mechanism would be used to represent a separator character embedded in a component.
We'd like to have a schema that represents referral data in a way that is considered natural for LDAP, can store the full richness that an NFS referral is capable of, and is easy to access and update with typical LDAP client tools like ldapmodify.
Are there other ideas we haven't considered?  What is a practical way to store an ordered variable-length list of strings in an LDAP attribute?  Is there a similar CIFS URL format that might be used to store SMB share information?
Thanks very much for your consideration.
-- 
Chuck Lever
chuck[dot]lever[at]oracle[dot]com