My patch to back-ldif/ldif.c for ITS#5408 is larger than the ldif.c file itself, which has left me wondering what to do with monster pathces:
Should they be split up in stepwise commits so it's easier to see the logic of each change? When reading patches, I vastly prefer them split up that way, but I don't know how much that is worth in the CVS tree. E.g. first a commit which preserves the program logic including bugs and just moves code around, then one with a bugfix which now can be small and easy to read, then one for the next bug, and so on.
Besides being more work, one problem with that approach is that one can make an error in an poorly tested intermediate version - since that version was never intended to be used anyway. Also I don't know what impact it has on CVS merging.
Also, what's the thereshold for when it makes sense to reformat and rearrange code just for the sake of readability? back-ldif ihas no consistent coding style. Normally it's "don't reformat, it makes CVS merges harder". In this case it was easy with some of it - I touch more than half of the lines anyway, so I could just as well reformat those and a few more to something closer to the OpenLDAP conventions (to the degree I understand them). But maybe in this case I should just as well reformat it all, and maybe also it's a good time time to split back-ldif up in 9 files of ~100-300 lines instead of one of ~1500 lines. (Regarding CVS merges, the HEAD and RE24 versions are equal, but RE23 differs.)
Another extreme example was a recent change to a big chunk of code, where the only change was to wrap the code in a new if-test and indent it. It would have been nice with a a comment in the commit, or a 1-line commit + a cosmetic commit.
Regarding formatting, when I do it I'd prefer to do it right - but I've never quite figured out what the preferred conventions are. We've had a style threads before but I don't remember if any of them got very far. So I thought I'd list some rules and confusions of my own that I've noticed.
* Generous whitespace: - Around parens in control statements - "if ( x )" etc. - Inside non-empty parens in function calls, sometimes sizeof, sometimes inside []. - Usually around binary operators.
However often but not always whitespace between two parens are omitted - e.g. "foo( bar( baz ))". Is there some preferred guideline about that? How about whitespace around parens that are for grouping? E.g. "if ( (ch = getc(f)) != EOF )".
* Similarly generous use of {} for if statements etc, usually used even when enclosing just a single statement.
* Indent 1 tab for continuation lines with the same paren level, e.g. x = foo( bar, baz ); y = one ? two : three;
BTW, emacs will format this the OpenLDAP way: wups( foo, bar ); but if there is anything after the '(' it aligns bar with it: wups( foo, bar ); OpenLDAP code often does put arguments after the '(' but to my eyes it looks easier to read without that, in addition to making emacs happy.
Complex multi-line statements require more creativity, but I can't tell what, if any, ideas lie behind their indentation (beyond trying to avoid such statements). Don't know if e.g. this is OpenLDAPpy: if ( (one( two, three ) && (four || (five && six))) )
* Tab-width = 4 columns. Makes a difference for when to break lines (keeping line length < 80), how to align comments after code on a line, and to align declarations and multi-line statements.
It doesn't hurt to try to keep code nice-looking with tab width 8 too. (E.g. comments above statements instead of aligned to the right, and often do not align variables in declarations.)
* Indent with tabs, not spaces.
* Usually align comments/decls to the right of code with tabs too, but not always. (If we used space to align after code, it would stay aligned regardless of tab-width.)
* Function names at first column in function definitions.
Hallvard B Furuseth wrote:
My patch to back-ldif/ldif.c for ITS#5408 is larger than the ldif.c file itself, which has left me wondering what to do with monster pathces:
Should they be split up in stepwise commits so it's easier to see the logic of each change? When reading patches, I vastly prefer them split up that way, but I don't know how much that is worth in the CVS tree. E.g. first a commit which preserves the program logic including bugs and just moves code around, then one with a bugfix which now can be small and easy to read, then one for the next bug, and so on.
Besides being more work, one problem with that approach is that one can make an error in an poorly tested intermediate version - since that version was never intended to be used anyway. Also I don't know what impact it has on CVS merging.
Stepwise - that makes sense to me when you are actually committing while the work is in-progress. Since you've chosen to not to commit any intermediate patches as you developed them, I don't think it really makes sense to artificially recreate those intermediate steps after the fact. If you want the sequence of logic to be visible, just describe it in the comments.
If there were well defined functional boundaries for the patches, then that would still be worth preserving in separate commits, particularly if it were possible to roll back one patch and still keep everything else. It's always preferable to have smaller patches. And it's always preferable to have small, well-defined bug reports in the ITS, so we can have one patch = one ITS. In this case, ITS#5408 is quite a bundle.
Also, what's the thereshold for when it makes sense to reformat and rearrange code just for the sake of readability? back-ldif ihas no consistent coding style. Normally it's "don't reformat, it makes CVS merges harder". In this case it was easy with some of it - I touch more than half of the lines anyway, so I could just as well reformat those and a few more to something closer to the OpenLDAP conventions (to the degree I understand them). But maybe in this case I should just as well reformat it all, and maybe also it's a good time time to split back-ldif up in 9 files of ~100-300 lines instead of one of ~1500 lines. (Regarding CVS merges, the HEAD and RE24 versions are equal, but RE23 differs.)
If the new code is going to be so different anyway, it may be best to only do two commits - first with a pure reformat of the existing code (and no syntactic changes), and then with the bugfix.
At the moment I see no compelling reason to split back-ldif into multiple files.
Another extreme example was a recent change to a big chunk of code, where the only change was to wrap the code in a new if-test and indent it. It would have been nice with a a comment in the commit, or a 1-line commit + a cosmetic commit.
Regarding formatting, when I do it I'd prefer to do it right - but I've never quite figured out what the preferred conventions are. We've had a style threads before but I don't remember if any of them got very far. So I thought I'd list some rules and confusions of my own that I've noticed.
Just offering my personal views, nothing official here.
- Generous whitespace:
- Around parens in control statements - "if ( x )" etc.
Yeah, I noticed that some of the older code treated "if" as a function call, so the parens were adjacent to the "if": "if( x )" which I'm not fond of...
- Inside non-empty parens in function calls, sometimes sizeof, sometimes inside [].
I generally don't like spaces inside []. There are some sections of the DN parsing code (using more than 1 dimensional arrays) where it helps readability though.
- Usually around binary operators.
However often but not always whitespace between two parens are omitted - e.g. "foo( bar( baz ))". Is there some preferred guideline about that?
How about whitespace around parens that are for grouping? E.g. "if ( (ch = getc(f)) != EOF )".
I prefer no whitespace between consecutive parens. Naked parens always look like a syntax error to me, they're an immediate distraction that I then have to dismiss before focusing on the point of interest.
- Similarly generous use of {} for if statements etc, usually used even when enclosing just a single statement.
I prefer no brackets for single statements, but I recognize that this leads to inconsistencies. (Particularly if the single statement is itself an if, and there is an else clause involved. In that case you're forced to use brackets anyway to disambiguate...)
- Indent 1 tab for continuation lines with the same paren level, e.g. x = foo( bar, baz ); y = one ? two : three;
BTW, emacs will format this the OpenLDAP way: wups( foo, bar ); but if there is anything after the '(' it aligns bar with it: wups( foo, bar ); OpenLDAP code often does put arguments after the '(' but to my eyes it looks easier to read without that, in addition to making emacs happy.
I prefer as many arguments as will fit on the first line. Excessive line breaks make it harder to grep for meaningful patterns.
Complex multi-line statements require more creativity, but I can't tell what, if any, ideas lie behind their indentation (beyond trying to avoid such statements). Don't know if e.g. this is OpenLDAPpy: if ( (one( two, three )&& (four || (five&& six))) )
- Tab-width = 4 columns. Makes a difference for when to break lines (keeping line length< 80), how to align comments after code on a line, and to align declarations and multi-line statements.
Yes, that's pretty much carved in stone.
It doesn't hurt to try to keep code nice-looking with tab width 8 too. (E.g. comments above statements instead of aligned to the right, and often do not align variables in declarations.)
Not a consideration. 8-column tabs would spill a lot of lines over 80 columns but there's no reason to worry about that.
- Indent with tabs, not spaces.
Yes.
- Usually align comments/decls to the right of code with tabs too, but not always. (If we used space to align after code, it would stay aligned regardless of tab-width.)
I prefer tabs.
- Function names at first column in function definitions.
Yes.
Howard Chu wrote:
[Rearranging a little]
And it's always preferable to have small, well-defined bug reports in the ITS, so we can have one patch = one ITS. In this case, ITS#5408 is quite a bundle.
OK, I'll keep that in mind next time. In this case I wanted to avoid several OpenLDAP versions with incompatible changes and no particular plan for the transition, but I could have noted that in the ITSes.
I'm getting confused below:
(...) Since you've chosen to not to commit any intermediate patches as you developed them, I don't think it really makes sense to artificially recreate those intermediate steps after the fact. (...)
Okay, but...
If there were well defined functional boundaries for the patches, then that would still be worth preserving in separate commits, particularly if it were possible to roll back one patch and still keep everything else. It's always preferable to have smaller patches.
Er, yes, that's what I thinking of. One commit per logical issue (listed in the ITS). Each committed version should work - or work better than the previous version anyway, without new bugs. Can do. Or did you mean should do next time, since you say:
If the new code is going to be so different anyway, it may be best to only do two commits - first with a pure reformat of the existing code (and no syntactic changes), and then with the bugfix.
Syntactic changes? The syntax is C:-) There are various "no-op" changes - reformatting, moving code between functions, changing function parameters, return types and a struct, renaming variables. Factoring common code out of functions, though that's not a no-op since each function differed slightly (and incorrectly:-)
Now that I think of it I could undo some of that and the patch will be smaller, since I originally did it when I modified related code and then later fixed an issue differently. Or I could go all the way and reformat everything, rename variables with abandon, and split up a 200-line function just for readability.
At the moment I see no compelling reason to split back-ldif into multiple files.
Fine.
Regarding formatting, (...)
Just offering my personal views, nothing official here. (...)
I think you are now officially the most active contributor, so it makes sense to try to follow your style though:-)
However often but not always whitespace between two parens are omitted - e.g. "foo( bar( baz ))". Is there some preferred guideline about that?
How about whitespace around parens that are for grouping? E.g. "if ( (ch = getc(f)) != EOF )".
I prefer no whitespace between consecutive parens. Naked parens always look like a syntax error to me, they're an immediate distraction that I then have to dismiss before focusing on the point of interest.
I have the same feeling about most of the OpenLDAP spaces inside parens, so for me it's a question of which weird and ugly style to pick.
So, "if (( ch = getc( f )) != EOF )"? Or did you only mean, not "if ( ( ch = getc( f ) ) != EOF )"? (I notice I made a mistake with my example, "getc(f)" should in any case have been "getc( f )".)
- Similarly generous use of {} for if statements etc, usually used even when enclosing just a single statement.
I prefer no brackets for single statements, but I recognize that this leads to inconsistencies.
Yup, me too. I tend to find briefer code more readable. For the same reason usually I prefer a = x ? y : z; over if ( x ) { a = y; } else { a = z; } I really don't get the reason for writing the latter unless the expression gets complex - in addition to more code to read, it even makes it a bit harder to notice that all (or both) branches modify 'a'.
(Particularly if the single statement is itself an if, and there is an else clause involved. In that case you're forced to use brackets anyway to disambiguate...)
Yup.
- Indent 1 tab for continuation lines with the same paren level, e.g. x = foo( bar, baz );
(...) OpenLDAP code often does put arguments after the '(' but to my eyes it looks easier to read without that, in addition to making emacs happy.
I prefer as many arguments as will fit on the first line. Excessive line breaks make it harder to grep for meaningful patterns.
Good point. Oh well. I do think it makes sense to group related arguments together if that doesn't lead to more lines though, e.g. foo( a, buf, sizeof(buf) ); rather than foo( a, buf, sizeof(buf) );
- Tab-width = 4 columns. Makes a difference for when to break lines (keeping line length< 80), how to align comments after code on a line, and to align declarations and multi-line statements.
Yes, that's pretty much carved in stone.
Which is why I'd like to reformat the entire source tree:-( Good excuse for changing to a style which Unix Indent and Emacs can handle though.
Though Emacs handles tab-width 4 fine. I have (roughly) this in .emacs:
(defun my-c-mode-hook () (modify-syntax-entry ?_ "w") ; Treat '_' like alphanumerics in C (setq case-fold-search nil) ; C is case-sensitive (c-set-style "stroustrup") ; Roughly the correct indentation style (when (string-match "ldap[^/]*/" (or buffer-file-name default-directory "")) (setq tab-width 4))) ; Tab stops = 4 columns (add-hook 'c-mode-common-hook 'my-c-mode-hook)
It doesn't hurt to try to keep code nice-looking with tab width 8 too. (E.g. comments above statements instead of aligned to the right, and often do not align variables in declarations.)
Not a consideration. 8-column tabs would spill a lot of lines over 80 columns but there's no reason to worry about that.
OK.
- Usually align comments/decls to the right of code with tabs too, but not always. (If we used space to align after code, it would stay aligned regardless of tab-width.)
I prefer tabs.
OK. Well, in case of the variable names in declarations I prefer a single space, then it doesn't matter if someone who uses tab-width 8 wrote the declaration.
Hallvard B Furuseth wrote:
Howard Chu wrote: I'm getting confused below:
Now that I think of it I could undo some of that and the patch will be smaller, since I originally did it when I modified related code and then later fixed an issue differently. Or I could go all the way and reformat everything, rename variables with abandon, and split up a 200-line function just for readability.
If you can break it up into smaller patches without too much hassle then please do so. My suggestion about just doing two commits was just in case it would be too error-prone to break it up further.
Regarding formatting, (...)
Just offering my personal views, nothing official here. (...)
I think you are now officially the most active contributor, so it makes sense to try to follow your style though:-)
OK... It's worth finding out where other folks have strong preferences though.
Also, we shouldn't be considering any major reformatting while RE23 is still active. (I had been hoping we would only have to do one more RE23 release before dropping support for it and declaring RE24 stable.)
However often but not always whitespace between two parens are omitted - e.g. "foo( bar( baz ))". Is there some preferred guideline about that?
How about whitespace around parens that are for grouping? E.g. "if ( (ch = getc(f)) != EOF )".
I prefer no whitespace between consecutive parens. Naked parens always look like a syntax error to me, they're an immediate distraction that I then have to dismiss before focusing on the point of interest.
I have the same feeling about most of the OpenLDAP spaces inside parens, so for me it's a question of which weird and ugly style to pick.
Feel free to suggest something less ugly. ;)
So, "if (( ch = getc( f )) != EOF )"? Or did you only mean, not "if ( ( ch = getc( f ) ) != EOF )"? (I notice I made a mistake with my example, "getc(f)" should in any case have been "getc( f )".)
- Similarly generous use of {} for if statements etc, usually used even when enclosing just a single statement.
I prefer no brackets for single statements, but I recognize that this leads to inconsistencies.
Yup, me too. I tend to find briefer code more readable. For the same reason usually I prefer a = x ? y : z; over if ( x ) { a = y; } else { a = z; } I really don't get the reason for writing the latter unless the expression gets complex - in addition to more code to read, it even makes it a bit harder to notice that all (or both) branches modify 'a'.
Agreed.
- Indent 1 tab for continuation lines with the same paren level, e.g. x = foo( bar, baz );
(...) OpenLDAP code often does put arguments after the '(' but to my eyes it looks easier to read without that, in addition to making emacs happy.
I prefer as many arguments as will fit on the first line. Excessive line breaks make it harder to grep for meaningful patterns.
Good point. Oh well. I do think it makes sense to group related arguments together if that doesn't lead to more lines though, e.g. foo( a, buf, sizeof(buf) ); rather than foo( a, buf, sizeof(buf) );
OK.
- Tab-width = 4 columns. Makes a difference for when to break lines (keeping line length< 80), how to align comments after code on a line, and to align declarations and multi-line statements.
Yes, that's pretty much carved in stone.
Which is why I'd like to reformat the entire source tree:-( Good excuse for changing to a style which Unix Indent and Emacs can handle though.
I guess that would provide us a permanent solution. Come up with a set of parameters for Indent and just post that on the Developer Guidelines. Sounds like a good idea.
- Usually align comments/decls to the right of code with tabs too, but not always. (If we used space to align after code, it would stay aligned regardless of tab-width.)
I prefer tabs.
OK. Well, in case of the variable names in declarations I prefer a single space, then it doesn't matter if someone who uses tab-width 8 wrote the declaration.
OK.