commit sizes, coding conventions, and back-ldif - openldap-devel

22 Mar 2008


      My patch to back-ldif/ldif.c for ITS#5408 is larger than the ldif.c file
itself, which has left me wondering what to do with monster pathces:
Should they be split up in stepwise commits so it's easier to see the
logic of each change?  When reading patches, I vastly prefer them split
up that way, but I don't know how much that is worth in the CVS tree.
E.g. first a commit which preserves the program logic including bugs and
just moves code around, then one with a bugfix which now can be small
and easy to read, then one for the next bug, and so on.
Besides being more work, one problem with that approach is that one
can make an error in an poorly tested intermediate version - since that
version was never intended to be used anyway.  Also I don't know what
impact it has on CVS merging.
Also, what's the thereshold for when it makes sense to reformat and
rearrange code just for the sake of readability?  back-ldif ihas no
consistent coding style.  Normally it's "don't reformat, it makes CVS
merges harder".  In this case it was easy with some of it - I touch more
than half of the lines anyway, so I could just as well reformat those
and a few more to something closer to the OpenLDAP conventions (to the
degree I understand them).  But maybe in this case I should just as well
reformat it all, and maybe also it's a good time time to split back-ldif
up in 9 files of ~100-300 lines instead of one of ~1500 lines.
(Regarding CVS merges, the HEAD and RE24 versions are equal, but RE23
differs.)
Another extreme example was a recent change to a big chunk of code,
where the only change was to wrap the code in a new if-test and indent
it.  It would have been nice with a a comment in the commit, or a 1-line
commit + a cosmetic commit.
Regarding formatting, when I do it I'd prefer to do it right - but I've
never quite figured out what the preferred conventions are.  We've had a
style threads before but I don't remember if any of them got very far.
So I thought I'd list some rules and confusions of my own that I've
noticed.
* Generous whitespace:
  - Around parens in control statements - "if ( x )" etc.
  - Inside non-empty parens in function calls, sometimes sizeof,
    sometimes inside [].
  - Usually around binary operators.
However often but not always whitespace between two parens are omitted -
e.g. "foo( bar( baz ))".  Is there some preferred guideline about that?
How about whitespace around parens that are for grouping?
E.g. "if ( (ch = getc(f)) != EOF )".
* Similarly generous use of {} for if statements etc, usually used even
  when enclosing just a single statement.
* Indent 1 tab for continuation lines with the same paren level, e.g.
    x = foo(
    	bar, baz );
    y = one
    	? two
    	: three;
BTW, emacs will format this the OpenLDAP way:
    wups(
    	foo, bar );
but if there is anything after the '(' it aligns bar with it:
    wups( foo,
          bar );
OpenLDAP code often does put arguments after the '(' but to my eyes it
looks easier to read without that, in addition to making emacs happy.
Complex multi-line statements require more creativity, but I can't
tell what, if any, ideas lie behind their indentation (beyond trying
to avoid such statements).  Don't know if e.g. this is OpenLDAPpy:
    if ( (one( two,
    		three ) &&
    	(four ||
    	 (five &&
    	  six))) )
* Tab-width = 4 columns.  Makes a difference for when to break lines
  (keeping line length < 80), how to align comments after code on a
  line, and to align declarations and multi-line statements.
It doesn't hurt to try to keep code nice-looking with tab width 8 too.
  (E.g. comments above statements instead of aligned to the right, and
  often do not align variables in declarations.)
* Indent with tabs, not spaces.
* Usually align comments/decls to the right of code with tabs too, but
  not always.  (If we used space to align after code, it would stay
  aligned regardless of tab-width.)
* Function names at first column in function definitions.
-- 
Hallvard