My patch to back-ldif/ldif.c for ITS#5408 is larger than the ldif.c file itself, which has left me wondering what to do with monster pathces:
Should they be split up in stepwise commits so it's easier to see the logic of each change? When reading patches, I vastly prefer them split up that way, but I don't know how much that is worth in the CVS tree. E.g. first a commit which preserves the program logic including bugs and just moves code around, then one with a bugfix which now can be small and easy to read, then one for the next bug, and so on.
Besides being more work, one problem with that approach is that one can make an error in an poorly tested intermediate version - since that version was never intended to be used anyway. Also I don't know what impact it has on CVS merging.
Also, what's the thereshold for when it makes sense to reformat and rearrange code just for the sake of readability? back-ldif ihas no consistent coding style. Normally it's "don't reformat, it makes CVS merges harder". In this case it was easy with some of it - I touch more than half of the lines anyway, so I could just as well reformat those and a few more to something closer to the OpenLDAP conventions (to the degree I understand them). But maybe in this case I should just as well reformat it all, and maybe also it's a good time time to split back-ldif up in 9 files of ~100-300 lines instead of one of ~1500 lines. (Regarding CVS merges, the HEAD and RE24 versions are equal, but RE23 differs.)
Another extreme example was a recent change to a big chunk of code, where the only change was to wrap the code in a new if-test and indent it. It would have been nice with a a comment in the commit, or a 1-line commit + a cosmetic commit.
Regarding formatting, when I do it I'd prefer to do it right - but I've never quite figured out what the preferred conventions are. We've had a style threads before but I don't remember if any of them got very far. So I thought I'd list some rules and confusions of my own that I've noticed.
* Generous whitespace: - Around parens in control statements - "if ( x )" etc. - Inside non-empty parens in function calls, sometimes sizeof, sometimes inside []. - Usually around binary operators.
However often but not always whitespace between two parens are omitted - e.g. "foo( bar( baz ))". Is there some preferred guideline about that? How about whitespace around parens that are for grouping? E.g. "if ( (ch = getc(f)) != EOF )".
* Similarly generous use of {} for if statements etc, usually used even when enclosing just a single statement.
* Indent 1 tab for continuation lines with the same paren level, e.g. x = foo( bar, baz ); y = one ? two : three;
BTW, emacs will format this the OpenLDAP way: wups( foo, bar ); but if there is anything after the '(' it aligns bar with it: wups( foo, bar ); OpenLDAP code often does put arguments after the '(' but to my eyes it looks easier to read without that, in addition to making emacs happy.
Complex multi-line statements require more creativity, but I can't tell what, if any, ideas lie behind their indentation (beyond trying to avoid such statements). Don't know if e.g. this is OpenLDAPpy: if ( (one( two, three ) && (four || (five && six))) )
* Tab-width = 4 columns. Makes a difference for when to break lines (keeping line length < 80), how to align comments after code on a line, and to align declarations and multi-line statements.
It doesn't hurt to try to keep code nice-looking with tab width 8 too. (E.g. comments above statements instead of aligned to the right, and often do not align variables in declarations.)
* Indent with tabs, not spaces.
* Usually align comments/decls to the right of code with tabs too, but not always. (If we used space to align after code, it would stay aligned regardless of tab-width.)
* Function names at first column in function definitions.