Altering the behavior of pcache to improve cache hit/miss ratio - openldap-devel

9 Mar 2010


      Hey folks,
I've been playing with pcache for about a week (in combination with back-ldap), testing various things and trying to
create a configuration that helps preserve system stability if access to the LDAP server disappears.  Many things seem
to work well, but I noticed that quite a good number of Unix commands, like "id", request a user's full entry (i.e., no
filter string).  In its current implementation, this will always fail to hit the cache, because (as the man page states)
you cannot specify an empty list of attributes.  This means that even if you have all the information required to return
what was requested in the local cache, you will never be able to leverage that information.
Given that, I wanted to ask if it could be considered to change the design of the overlay a bit, such that if an empty
attribute list is specified, pcache will return every attribute *in the cache* for that entry, instead of blindly
assuming it doesn't have the information to accomplish the task.  I understand the shortcomings this could introduce,
most notably if one did not want to cache every attribute for a particular entry and still wanted to be able to omit the
filter string and get the entry's full attribute list, but I think the benefits would make it worthwhile.  I say that
for a few reasons: if you're using pcache and are interested in truly optimizing performance or keeping a system that is
LDAP-dependent from having major issues if the network disappears, it would make sense to cache every attribute for
entries used in common system tasks (e.g., using utilities like "id").  I believe this could also significantly improve
the hit/miss ratios, especially if you take a look at how many of these common system utilities make queries without
specifying any kind of filter.
I would think that this could be accomplished by adding a case for the empty filter string, such that it would take a
unison of each set of attributes corresponding to the cached entry for that particular LDAP filter, and returning the
result.  If the same attribute is present in more than one cache entry matching the specified LDAP filter, the most
recent would be favored and returned, thereby preventing pcache from having choose between several
query-string-dependent entries for that entry, if more than one are present.
I'm interested to hear what you guys think of this.  Even if my implementation ideas aren't 100% spot-on, I would think
that the wiser minds could adapt the general idea in to something more artful, provided they agree with the idea in
principle.  Thanks!
-Ryan