top | item 29671450

Best practices for writing code comments

193 points| nsoonhui | 4 years ago |stackoverflow.blog | reply

188 comments

order
[+] zerocount|4 years ago|reply
More often than not, I see code without any comments. There's this idea of writing self documenting code that really changed the commenting world.

And that whole thing was evangelized by Uncle Bob and the Agile wrecking crew. Before long it was bad to use comments, switch statements, or new up an object. This, in turn, led to the TDD movement, Agile only movement, enterprise patterns for all projects movement, and I'm sure there are others I'm forgetting.

Please comment your code. Tell me what you're trying to accomplish with this block of code. The function name doesn't always suffice. And I don't want to stare at it for 15 minutes, or re-format your 160 column LINQ statement, or Google your regex so I can read what it does on StackOverflow.

Even commenting pseudo code would be fine.

[+] haihaibye|4 years ago|reply
Yeah it's often really worth giving some examples of what you're trying to match (and not match) next to regex code.
[+] 62951413|4 years ago|reply
That's one of the real world mysteries for me.

* If you ever had to maintain a code base you didn't design (who haven't ???) how can you not want to have as much paper trail as possible? Especially when the original authors move on.

* I don't remember discussing documentation (or even testing) styles/preferences during interviews. I don't remember managers making it a priority or educating the unwilling, even in large companies. So it's clearly not important from the business perspective. The closest I can remember was a mad rush to create runbooks after a particular nasty prod incident.

We live in a world of microservices. So you routinely work with multiple repositories. In larger companies developers routinely move to different teams in a couple of years. There's always another AWS service or non-relational data storage. It's interesting to notice that in other domains dealing with this kind of complexity and constant change it's expected to have written notes all the time. Think of medical doctors or aircraft maintenance crews.

When I was younger I kind of trusted that "self documenting code" promotion. As much as the Refactoring book was right this idea proved to be just wrong. Think about "business logic". Including the classical "converting XML/JSON to other protobuf/JSON". There's no grand theory here, just multiple confusing details influenced by previous versions, legacy ontologies, dependencies on other teams. Naming conventions won't help much, not to mention "the two most difficult problems in CS".

I think there's some correlation between poor documentation and missing tests. And the lame excuse is always the same - not having enough time. Even for obvious error-prone things such as calendar calculations or parsing deeply nested data received from the outside world. Or how to build and run a service.

Another correlation is between clear/structured thinking and how easy it is to explain the results. A reasonable functional decomposition and popular idioms/patterns/libraries documented elsewhere enable terse descriptions.

I see documentation as fungible. There are multiple somewhat interchangeable places information can be stored in. JIRA descriptions, commit messages, "javadocs", MD files (previously known as GOOG docs, wikis, Word documents). From what I've seen the people who have enough discipline to use one usually have others in place too.

Software development is very much a learning process. So other people will have to repeat it unless you summarize your findings for them. For some low-level details you could be one of them after not working on a particular component for long enough.

[+] fenomas|4 years ago|reply
> Rule 6: Provide links to the original source of copied code

I learned this rule viscerally early in my career. Back in the golden age of Experimental Flash Art there was an enigmatic site called "flight404", and one day the site's author released source files for some of his more popular projects. All the Flash devs in my office started poring through them, and soon after my boss called me over to show me my own name in one of the comments!

Apparently the author (Robert Hodgin - whom I very much looked up to) had asked for help anonymously in a forum, and I had helped him out so he credited me in a comment (just for his own reference - in those days there was no Flash open source community and designers rarely distributed their source). That experience made me pretty obsessive about crediting outside influences in my source code whenever I get the chance.

[+] locallost|4 years ago|reply
I write my comments in commit messages because those are valid forever. A lot of times somebody will write a code comment, the code will be changed, but the comment not. This is a huge waste of time on so many fronts: writing the comment in the first place, and then confusing the subsequent developers with the wrong information. If you truly want to understand something, you can always check the change log, and find out why things are how they are.

Exceptions are e.g. if it's something exceptionally tricky or a hack of some kind that is kind of important. It doesn't happen all that much because the stuff I work on is simple. If I was going to do a lot of "commenting" I would prefer to write and update good documentation that gives an overview of how different things work together. The nitty gritty changes too often and is not that important in the grand scheme of things.

[+] janaagaard|4 years ago|reply
> I write my comments in commit messages because those are valid forever.

Don’t they disappear when someone squash merges branch where a file is both renamed and changed (a lot)? Or, at least, when somebody decides to move to code to another repo, and doesn’t bother bringing the git history along.

[+] u801e|4 years ago|reply
> I write my comments in commit messages because those are valid forever.

I try to write good comments and commit messages. Comments are typically along the lines of explaining what was done and how for a particular block of code or method. Header comments include a list of parameters, return values, side effects, class variables, etc. Commit messages explain what was done or changed and why.

[+] Zababa|4 years ago|reply
Same thing here. At work I can see the history going back to 1995. And that's after migrating from something to Mercurial, and from Mercurial to git. Maybe that's not the case in all companies, but in the one I work at, the commit history is the longest-lasting information trail.
[+] nine_zeros|4 years ago|reply
> I write my comments in commit messages because those are valid forever.

Only until someone moves a file to a new directory. Now this file shows up as a new file with no history.

Also, you hope to never change your version control system because that change will erase the history.

Relying on commit messages is sometimes just not good enough.

[+] codedokode|4 years ago|reply
So if you added several classes and functions, you describe them all in a single commit message? Probably you don't document them at all.
[+] yarky|4 years ago|reply
That's a great idea. I've been doing this just to rationalize my laziness, but now it makes sense. However, I'm not sure those are valid forever as you can simply delete the .git folder, can't you?
[+] 8note|4 years ago|reply
Commit messages don't necessarily last as long as the code that they comment though.

Changing revision control system or copying code from one package to another can lose them.

The only durable documentation I've seen is in the source code

[+] interactivecode|4 years ago|reply
That’s horrible because the git commit messages are easily lost, disconnected or hard to find in any reasonably active codebase. For example as soon as you do a change and move a file it almost always disconnects from the previous change history.

Whats even more difficult is searching through a code base when the documentation isn’t in or near the code. I don’t know any IDE or editor that makes it easy to search though git commit message and source code at the same time.

On top of that, do you review git commit message in code review? Do you aks people to improve descriptions, typos and language in commit messages?

[+] christophilus|4 years ago|reply
I’ve been misled by comments so often that I now literally don’t see them. They’re like banner ads on websites. My brain just doesn’t register them anymore.

They get orphaned by slightly wonky merges. The underlying code gets updated or refactored, but the comments remain. When they are correct, they’re useless 90% of the time (at least in codebases whose linter requires doc comments). Even the accurate comments tend to drift with age and become inaccurate unless they’re carefully maintained (which they almost never are).

It’s really hard for me to figure out the balance.

For exceptionally good codebases (Redis, SQLite come to mind), the comments are a godsend.

For mediocre codebases, the comments are largely a waste of time at best, misleading and time-wasting at worst. And most of us, I suspect, are working on mediocre codebases.

[+] codedokode|4 years ago|reply
Rust libraries use comments for generating documentation and testing. Here is an example of such documentation: [1]. It is difficult to believe that this code would be better without comments.

At least classes and public methods should have comments (except trivial ones).

[1] https://docs.rs/chrono/0.4.19/chrono/naive/struct.NaiveDateT...

[+] AtlasBarfed|4 years ago|reply
HEavily used code will evolve good comments, as a product of multiple smart people using and improving the same code.

Bad codebases are ones that basically are throwaway or depreciating assets. As you say, we practically all work on these except the lucky few.

[+] furstenheim|4 years ago|reply
I miss the most important one. Explain business reasons.

Code does what it's written for. But it does not explain intent. Write that on a comment, link to the relevant ticket and document

[+] xixixao|4 years ago|reply
This is what source control blame is for. It’s very hard to estimate what business reasons will be important to readers of code in the future.

I very often nudge people to remove their comments entirely. Less experienced devs often write comments to explain code, instead of spending time on making the code itself readable/understandable. I often ask: “Can you modify the code such that the comment will become obsolete?”

[+] dorwi|4 years ago|reply
Oh no, never link to tickets or docs that are not version controled in the same repo. Usually code tends to outlive the tools for organisation.
[+] skrtskrt|4 years ago|reply
I always say more generally “explain the why”, business OR technical reasons.

Bigger architecture decisions should go in ADRs (again, explain the why) but smaller stuff like explaining why you monkeypatched a library can save future devs a lot of lost investigation time and pain. Maybe by the time they are reading your comment/code, the patch you wrote is supported in the main library!

[+] WalterBright|4 years ago|reply
One of the most transformative features we added to the D language was Ddoc, which is a documentation generator for functions. It has a modestly standard format. The result is it applies significant pressure on the coder to add the Ddoc comments, and a routine change request review of a PR is "please add function ddoc comment".

Before Ddoc, the D standard library was inadequate, totally wrong, or missing entirely. After Ddoc, it became reasonable (though no documentation is perfect). Further improvements were an ability to actually run the example code in the Ddoc comment as a unit test.

The end result is the entire documentation of the D runtime library is generated by Ddoc from the source code.

[+] ameliaquining|4 years ago|reply
Doesn't every language have a feature like this these days? If not an official standard, then a commonly used third-party tool.

Though embedding unit tests in API documentation is less common; the only other languages I'm aware of that support it out of the box are Python and Rust. A Google search turns up some implementations for other languages, like C++ and Haskell, but I don't know how widely used they are.

Regardless, I definitely agree that it's a very important and useful feature.

[+] salmonellaeater|4 years ago|reply
One that's missing: comments should explain why a piece of code exists or is written in a certain way (and implicitly, when it can be changed or removed). This overlaps with "explain unidiomatic code in comments", but there can be idiomatic code whose purpose isn't obvious.
[+] spinningslate|4 years ago|reply
+1 for this. It also links to my #1 rule for comments: why not. It applies when:

1. There's a chunk of code that, on first reading, could be clearer/simpler/more idiomatic.

2. There's a good reason not to use the obvious approach, and do something else instead (maybe performance).

Then comment to explain why the obvious path wasn't taken. No matter how well written, code alone can never explain "why not". I've found this invaluable, even looking back at my own code.

[+] i_hate_pigeons|4 years ago|reply
I once worked in a codebase full of such comments, they were all like

// Adding this because XYZ said so

[+] idiocrat|4 years ago|reply
When I started many aeons ago, when I did not know what I am doing, so I tempted to documented the language itself:

inc al ; add one to al register

[+] Symmetry|4 years ago|reply
I like these rules but if I were writing my own set the first one would be that the most important comments you write are often those describing persistent mutable state. Often the point of keeping an objects members private is to preserve the parity among them and you explaining it where they're declared can save everyone a lot of trouble. Also if you've got a state machine the semantics of all the different states.
[+] ok_dad|4 years ago|reply
I write comments first, then fill in the code later, adjusting the comments as I learn better ways to do things. That way, the comments are like a guide to anyone as to the goal of a section of code. I comment about every 2 or 3 lines of code, or more sometimes. I even comment on things that everyone would easily understand. My comments are basically a plain English version of the code. Functions and such have comments or docstrings that explain the function and the basic steps of its functionality, so that’s about 2 times that I explain things in my code. I’ve never had anyone say I comment too much. When reading uncommented code, I wish there were more comments sometimes to explain what each variable does or is. My variables are often named with 4 or 5 words, connected by underscores or title case. I often see variables that aren’t named well and try to avoid that myself.
[+] jniedrauer|4 years ago|reply
Are you concerned that you are introducing tech debt into the codebase? Anyone who refactors your code later will also have to refactor your comments (but likely won't).
[+] onion2k|4 years ago|reply
Kernighan's law is fun.

Debugging is twice as hard as writing the code in the first place. Therefore, if you write the code as cleverly as possible, you are, by definition, not smart enough to debug it.

If effort is linear then, by definition, you should never put more than 50% brainpower in to your code.

I'm not sure I want to increase the effort I put in though.

[+] jbjbjbjb|4 years ago|reply
Only if complicated is considered clever. Clever might actually make things simpler.
[+] kiru_io|4 years ago|reply
All these discussion about comments miss the most important point of why we need comments: To aid us in understanding the code.

My only rules to write comments are:

- Add “why” comments when you write the code

- Add all the other comments when you read the code, and don't understand

[+] HugoDaniel|4 years ago|reply
"Rule 6: Provide links to the original source of copied code."

StackOverflow looking to get more backlinks from GitHub/GitLab :)

[+] lordnacho|4 years ago|reply
When you solve something in a weird way, leave a link to SO in a comment.

Write TODO and NOTE and use a tool to find all your special comments that show unfinished features and investigations. Scan through regularly to make sure the comments still make sense.

Comments not containing special strings should just be "why" explanations, eg "we sort the bids in the reverse order to the asks because the best bid that the highest price". So generally something where the code has special cases that are explained by the domain.

[+] gorgoiler|4 years ago|reply
There are three kinds of comments:

1/ API documentation, which is a must unless you can cover everything with examples, which are better.

2/ Internal comments explaining how things fit together. I don’t do any of this any more. If my code doesn’t make this obvious, my code is wrong and gets refactored and functions get better names.

3/ Warning signs. Invaluable! “You might think that this is wrong and change this to use / instead of //. Nope! Don’t make that mistake!” kind of thing. Few and far between, hopefully.

[+] rmbyrro|4 years ago|reply
When I find APIs that show a few examples but miss detailed specs I always miss the latter.

I agree examples are great for a variety of purposes, but they're no substitute for detailing the API endpoints, authorization mechanism, data types, etc.

[+] WalterBright|4 years ago|reply
I often include links to the section in the online D ref spec that defines the behavior the code is implementing. This turns out to be very handy.

I wish that could be done for C and C++. Too bad it can't, because of copyright issues. I don't link to online descriptions of the C std library, because I've found errors in those online rewrites (rewrites because of, again, copyright issues).

So, for C and C++, I just cite the paragraph number in the standard.

[+] noisy_boy|4 years ago|reply
If you are parsing/manipulating a particularly hairy data structure, try to simplify it. If you cannot, put a comment with a simplified example of the input/output data structure(s) so that the the next developer (which may be you few months down the line) has something visual to match the code against instead of having to imagine everything.
[+] sethammons|4 years ago|reply
Yes! I love the data input/output example in a comment. So few developers do this
[+] josefrichter|4 years ago|reply
If you’re not English speaker, do you always write comments in English, or in your native language? I personally see the latter as bad practice - in 21st century it’s almost impossible to expect that your code won’t ever be read by a foreigner. But I wonder how do you feel about it?
[+] DeusExMachina|4 years ago|reply
I am not an English speaker, but everything that ever went into my code was always in English, including names of variables, types, etc. Programming languages are in English and there is something that irks me in having another language mixed in.
[+] masklinn|4 years ago|reply
< do you always write comments in English, or in your native language?

Same language as the codebase, so usually english.

It can make sense for the codebase to use local naming conventions e.g. for legal, accounting, or administrative concerns: the ideas and concepts don't necessarily translate easily (or at all) and all the reference documents are in the local language in which case the codebase will probably be better off using the local language, and both comments and commit messages should match.

[+] antupis|4 years ago|reply
At least here work language is English comments, jira tickets, code and so on. even when orginal team is fully native Finnish speakers next guy might not.
[+] progx|4 years ago|reply
Code (variables, ...) are always in english, they are most of the time shorter than german words.

Comments are in my native language, if i am absolute sure, that this code will not be used by any other people.

[+] zvr|4 years ago|reply
In Rule 6 (Provide links to the original source of copied code) the article says:

> People copy a lot of code from Stack Overflow questions and answers. That code falls under Creative Commons licenses requiring attribution. A reference comment satisfies that requirement.

This is incorrect (or, more accurately, not enough). The license is CC-BY-SA: the BY part requires attribution, but the SA part also requires that you share your own code.

[+] Supermancho|4 years ago|reply
I'm not sure why this has rule 8. Don't do this. This is handled by git blame and PRs.

In regard to rule 5, I've found it's a bit more nuanced than:

> Without the comment, someone might “simplify” the code or view it as a mysterious but essential incantation. Save future readers time and anxiety by writing down why the code is needed.

What is idiomatic? Well that depends on nested organizational requirements merged with some community merged with developer experience.

I have some methods:

    public void doSomething() {
        myType foo = createType();
        foo.monitor();
    }

    public myType createType() {
        return new myType();
    }
There are no comments. What's idiomatic about this? Well the doSomething tests needed a mock, so we get a random create method. Why did the doSomething tests need a mock? Because the organization wants code coverage this way. You have to assume, because of company policy, there's tons of these things everywhere. I hate the term "idiomatic" when it's more subjective than anything else.
[+] everybodyknows|4 years ago|reply
Rule 8: Add comments when fixing bugs.

>I'm not sure why this has rule 8.

It's oriented toward maintainers. Hence little attention to larger architectural questions or business strategy.

[+] ck45|4 years ago|reply
John Ousterhout spent a (short) chapter on writing comments in “a philosophy of software design”. The whole book is in my opinion a must read and gives advice rather than claiming it has all the answers. Back to original topic: including how and when to write comments