top | item 28416269

Writing Well-Documented Code – Learn from Examples

333 points| ainzzorl | 4 years ago |codecatalog.org | reply

142 comments

order
[+] pcmoney|4 years ago|reply
Personally I found most of those examples to be not great. I think they would be improved better names overall, more well named private methods and most importantly well structured companion test classes proving the behavior.

I really dont want to read another developers comments on their mental state while I am feverishly debugging their code during a 3am page.

The number of times I have been misled by an outdated comment. I generally open git blame to see if the comments predate any code now. Tests on the other hand almost never lie.

Readmes or comments explaining why something exists vs what it allegedly does are generally much better. If you can’t write expressive code why would you assume your English is much better?

[+] SiVal|4 years ago|reply
The problem with all discussions of good commenting style is that it's like debating good footware. Good for what? Office? Hiking in snow? Walking on the beach? Except that we have different footware for different uses but only one set of comments.

Having such a limited commenting syntax is like assuming one permanent type of footware and debating which one would be best. I'd much rather have a smarter commenting system.

I'm not talking about supplementary docs stashed elsewhere to gather dust. I'd like to have expandable comments right where they are needed but usable differently for a glance at familiar code or puzzling through unfamiliar code. Something very brief that doesn't clutter the code, a couple of words or no words at all, with an expansion arrow to something more detailed, expanding to something fully documented with links to everything.

And comments with scope (this comment applies to this line, this one to the function, this to a group of functions...) that actively use the scope. By active I mean that if you make any edits to code in the scope of a comment, that comment changes color or some other mechanism similar to a Git merge conflict: You changed the code without changing the comment, so check and make sure the comment says what you want and click OK. You don't have to fix the comments immediately for each edit. Make a few changes, test, make sure it's working, but when it is, you can see at a glance which comments need to be checked.

The general idea is instead of debating The Best Way to Comment, consider ways to improve the commenting system itself.

[+] nijave|4 years ago|reply
>The number of times I have been misled by an outdated comment

Imo a problem with code review and not leaving comments. Just like tests need changed during refactoring, documentation and comments should also be cleaned up as part of the change

I have found those as well but place the blame on the person who made the subsequent change and ignored them

[+] tempest_|4 years ago|reply
That is how I feel about these comments.

In a vacuum I like them and when I bump into a piece of 10 year old code commented like this it is great.

There is one caveat however, they have to be meticulously maintained and always updated to reflect every change made to the code. It is a huge caveat because more often than not the comments are out dated and no longer reflect the actual code and at that point they are detrimental.

[+] AlexCoventry|4 years ago|reply
> If you can’t write expressive code why would you assume your English is much better?

Because English is optimized for communicating human intent?

[+] chris_wot|4 years ago|reply
I have to agree. Unit tests are some of the most helpful things I’ve used whilst refactoring the Libreoffice vcl module. In fact, I’m now at the stage where if I only move a class into a new header I’m writing a test that covers that class.

An example I’m currently working on is:

https://gerrit.libreoffice.org/c/core/+/121403/5/vcl/qa/cppu...

[+] omegalulw|4 years ago|reply
Yup, the rate limiter comments are unhelpful. A top level comment explaining the idea is fine (and very useful) but code should largely be self documenting.
[+] tpoacher|4 years ago|reply
To me, comments in code are like descriptions in a novel.

I would agree that comments which unnecessarily describe what is to come are bad, in the same way that a novelist should prefer to show, not tell, unless there is a compelling reason for the reader to keep that description in mind as the immediate events unfold.

But to require almost all comments gone in favour of well-named code and external docs feels wrong too, in the same way that one would probably not like to read a book with no chapter headings, no visuals, simply describing the story in a dry (but informative) language, and relying on cliffnotes for actual context.

And if you say code is more like a technical report than a novel, I will disagree. Good code is always a pleasure to read, and typically for similar reasons that a good novel is pleasurable.

[+] mrathi12|4 years ago|reply
Agreed, I think that the code itself should be readable through better names, with the comments being a bonus rather than critical to the understanding of the code.
[+] t-3|4 years ago|reply
Exactly. In my opinion, Thinking Forth by Leo Brodie should be required reading for anyone who touches code.
[+] codethief|4 years ago|reply
Speaking of source code comments, antirez (of Redis fame) wrote a fantastic article[0] about that topic some time ago and I still recommend it to colleagues whenever they make the hollow statement that "code should and can be intelligible on its own, without any comments".

[0]: https://web.archive.org/web/20210226004600/http://antirez.co... (I still don't understand why he deleted his blog)

[+] PebblesRox|4 years ago|reply
I love this line: "Comments are rubber duck debugging on steroids, except you are not talking with a rubber duck, but with the future reader of the code, which is more intimidating than a rubber duck, and can use Twitter."
[+] abootstrapper|4 years ago|reply
I agree with you. Code should be intelligible on its own without comments. It should also be well documented. One does not preclude the other.
[+] seanwilson|4 years ago|reply
For the first example with conditions, I much prefer rolling the "why" of the comments into boolean variable names where possible e.g.

   // We still have burst budget for *all* tokens requests.
   if self.one_time_burst >= tokens {
      ...
   } else {
      // We still have burst budget for *some* of the tokens requests.
becomes something like (I'm missing context but you get the idea):

   enoughBudgetForAllTokenRequests = self.one_time_burst >= tokens

   if enoughBudgetForAllTokenRequests
      ...
   else
      ...
I don't see this often though. I see the commenting pattern a lot where the comment usually duplicates the "what" of the conditional code, and it's often ambiguous if the comment is talking about the condition or a whole branch.

I think a descriptive variable makes the code easier to skim, it's more precise and less likely to become inaccurate. For a big complex conditional, you can break in up into several well-named boolean variables you compose together (but obviously comment the bits that can't be covered by a concise variable name).

[+] achenatx|4 years ago|reply
some types of comments are redundant.

These days I write my code as comments first and then write the code.

But for example this one

        // If either token bucket capacity or refill time is 0, disable limiting.
        if size == 0 || complete_refill_time_ms == 0 {
            return None;
        }
The comment is the same as the code. It is pointless. Instead I might say under what circumstances I expect them to be zero or why Im disabling limiting when they are zero.

This one is great, though at first glance I cant relate the actual code to the comment, at least I understand why the code is there.

            // We still have burst budget for *some* of the tokens requests.
            // The tokens left unfulfilled will be consumed from current `self.budget`.
            tokens -= self.one_time_burst;
            self.one_time_burst = 0;
[+] komon|4 years ago|reply
In defense of the first comment, when first approaching a body of code like this, it's not always going to be clear what returning None is going to mean in any given method.

Also, one thing I think doesn't get mentioned enough is the use of comments as anchors for text search. Having "disable limiting" expressed in English that way makes it easier for newcomers to explore the codebase.

[+] Zababa|4 years ago|reply
I don't think the comment is the same as the code. If "size" was called "token_bucket_capacity" I would agree but since it's not, if you remove the comment you lose some context.
[+] Koshkin|4 years ago|reply
> understand why the code is there

You hit the nail on the head here. This is the single most important reason for having comments in code (aside from just being a part of a coding standard).

[+] willseth|4 years ago|reply
The Protobuf documentation made me cringe. Every character of its "personality" adds a cognitive burden for the reader and a maintenance burden for anyone who has to update it. If a developer wants to be cute with the same brevity and clarity that the comments would have otherwise, fine. Otherwise it's self-indulgent performative trash that should be rejected in review.
[+] SV_BubbleTime|4 years ago|reply
Absolutely. That was not a good example for me.

> self-indulgent performative trash

A little harsher than I would have used; but now that you say it.

>that should be rejected in review.

Without question. I would have absolutely denied that until it was trimmed to at most 10% of what was there.

[+] MattGaiser|4 years ago|reply
I just prefer nice, long, descriptive function names.

I was working on an unfamiliar codebase yesterday and had to work with the postMessage function. It was React, so one might assume that it sends the message to the backend, which was my initial assumption. Nope. It posted it to the page. Didn't help that another utility had another postMessage that was about what to do after the message was sent.

[+] tartoran|4 years ago|reply
Better than writing really long descriptive function names is to not get carried away but write well described function names that show some intent or description and are chosen carefully. I worked on some code bases using long descriptive function names (inherited from another dev) and it wasn’t always useful. It was overkill. The descriptions were either too generic, too descriptive (redundant in half the cases) or handling such long saussage names was simply tiring. When you have to scroll right to read a name it becomes a problem..

The takeway is that merely writing very long descriptive names isn’t a silver bullet and could cause harm despite good intentions.

I suggest going back a after a few days for a refactoring (renaming functions or variables) and see how it reads to you. Then repeat the process a few days later when you’re even more detached from that code.

In many cases self documenting code is going to help you remember the model you had in your head when building it. And if done well it will help others as well, your code will end up not giving headaches to future maintainers.

[+] SavantIdiot|4 years ago|reply
I worked with a guy who wrote a CPU simulator and exceeded the MSVC 256-char function name limit. It's hard when you like 80/100 col code and a function name is longer than that.
[+] didip|4 years ago|reply
I have a simple rule to go by. Comments should describe “why” and the code should describe “what”.
[+] ainzzorl|4 years ago|reply
While I agree with the general sentiment, the first example in the article IMO demonstrates how "what" comments make understanding code easier. When I first read that code, I was pleasantly surprised how easy it was to understand everything going on there, even though I had no familiarity with the code base. This is rare for non-trivial projects. I'm not sure if it could be made as clear without those comments.
[+] trinovantes|4 years ago|reply
"Why not" comments are also very helpful

Self-explanatory code can't explain themselves if they do not exist

[+] einpoklum|4 years ago|reply
I upvoted you, but there are plenty of exceptions. For example:

* Common name of the algorithm you're using

* Reference to somewhere else in the code that's related

* "You may think this code only does X, but it really also does Y"

* "We used to also do XYZ here, but that was removed because of ABC"

and so on.

[+] ridiculous_fish|4 years ago|reply
How do you know what a function does, without a "what" comment to explain it? Even "obvious" functions like Math.max have edge cases, and I rely on "what" comments to understand how those cases are handled.
[+] SV_BubbleTime|4 years ago|reply
WHY comments drastically improved my code.

I’ve similarly been paying much more attention to convey INTENT which I think is massively missing from computer science. The code might suck, but if the intent is clearly defined, it helps to refactor later.

[+] ozim|4 years ago|reply
I agree and first example in the article is all comments describing what the code is doing. There is not a single one explaining why.

Well I stopped reading further.

[+] Sykander|4 years ago|reply
A lot of this I find to be kinda bad commenting, you've got to remember NOT to just directly comment on what the code is doing, but rather explaining why. With future updates your comments will eventually be wrong if they do anything other than give context.

If all your comment does is say what the code on the next line is doing, don't. Instead just try to make the next line more readable.

[+] makeitdouble|4 years ago|reply
More and more I feel these huge comment blocks with the following inline comments sprinkled everywhere should really be a separate documentation file with its own structure where anyone not familiar enough can get up to speed, and understand the core principles behind the code written here.

For instance in the firecracker example, they don’t write 3 pages of block comments to explain the token bucket algorithm (or do they?), as anyone could go read the wikipedia article instead.

Comments living with the code have their benefits, but here I can’t imagine these huge comment blocks or algorithm related warnings get revised anytime outside of project wide refactoring.

What we also miss by having these comments inlined is the higher perspective of how it works with the other classes or how it fits in their running context. Anything that spans multiple class and/or methods makes it difficult to decide where to put the info and what scope it should have. People usually end up explaining the same things at 3 or 4 places in sublty different ways, which makes no one happy.

An exemple of that, this struct description

> /// TokenBucket provides a lower level interface to rate limiting with a /// configurable capacity, refill-rate and initial burst.

This comment is hinting at a sequence where ‘TokenBucket’ needs to be a configurable lower level interface. But what is that sequence, what needs it to be lower level, is there higher level interfaces ? I’ll sure end up spelunking the rest of the doc to get answers to that.

[+] ahallock|4 years ago|reply
I like meta comments about the business or technical decisions behind the code. You can come up with clever method and variable names to help enlighten the reader, but nothing beats an intimate, first-hand account explaining the why.
[+] billytetrud|4 years ago|reply
I agree with others, code commentary shouldn't be a novel. Comments should only provide info that the code can't make easy to understand. Why it was done this way, what it's intended to do, complex or nonlinear interactions that aren't clear from the code. But adding things that make reading the code sound more like English isn't helpful, it's extra work and prone to getting out of sync.
[+] jpswade|4 years ago|reply
The code should explain what it's doing (self documenting code) and tests should explain why it's doing it.

Comments tend to just become a place for misinformation or get disconnected from the actual logic.

Adding more comments sometimes doesn't clarify the situation, it just acts as a second source of truth.

[+] rgoulter|4 years ago|reply
Worth comparing: http://steve-yegge.blogspot.com/2008/02/portrait-of-n00b.htm... A "n00b" is scared of code, and wants as much help to understand it as possible. An "expert" already knows all they need to know to work on the code, and is more productive by putting as much code on the screen as they can.

Rather.. I think part of the point is, context matters. If the team is small, and the project is changing rapidly.. I think the needs and expectations for documentation are different than if the project or team is large and change is slow.

[+] lifeisstillgood|4 years ago|reply
Time and active readers.

Want well documented (and tested and thought out) code?

Then give the coders people outside of their team / hierarchy who are going to check it / read it / use it.

And then give them time.

And repeat.

[+] throwaway984393|4 years ago|reply
Don't sleep on /usr/share/doc/, /usr/share/info/, and /usr/share/man/ either. Code comments are not a replacement for real documentation that is written from the perspective of either a user or a developer. If you can somehow auto-generate meaningful documentation, that's great, but often whatever is auto-generated is only good in theory.
[+] penguin_booze|4 years ago|reply
The view I've developed on docuemntation over time is that it should be addressed outside in. What I look for--and also create, where possible--is a high-level architecture diagram that depicts all major components of the system. Then comes some description of typical control flow through the system, either visually or via. textual description. For example, "the request hits the server, then it goes here, does that etc.).

What would be even better is to create a model of the system using something externally altogether, like TLA+. It need not be perfect right at the beginning, but over time, invariants can be put in place, and even possibly mechanically tested by a model checker. Just even the Next state disjunct would work great - it'll give me the superset of actions the system is capable of.

To me, well-commented functions are all well and good; but without a high level understanding of how everything fits and works together, I doubt it'd be very helpful. It's like winning the battle but losing the war.

I'm not a fan of relying on IDE-centric features either. What if my IDE is not yours?!

[+] jpollock|4 years ago|reply
I had a lot of trouble reading that code. I kept getting distracted by the comments and missed the code. The code seems pretty self-explanatory. I have some changes I would make, but they really depend on performance vs depth (modulo the burst vs budget bug).

  // Hit the bucket bottom, let's auto-replenish and try again.
  self.auto_replenish();
Is ugly. Why repeat in English exactly what the code says? I don't even know what language this is, but I know it's replenishing the tokens in the bucket.

All of the comments in reduce are redundant, this is what I would write at the top, so that anyone using the method doesn't have to page through to figure out what it does, it also makes the "smells" more obvious:

  // reduce: attempt to consume tokens from bucket, 
  // starting with one_time_burst, overflowing to 
  // budget. 
  // (keep comment only if _expensive_):Performs
  // auto-replenish when
  // burst is emptied, and budget has insufficient
  // tokens.
  // Asserts size, but only on replenishment.
  // On OverConsumption, entire bucket is emptied. 
  // On Failure, only one_time_burst is emptied.
  // In all cases, tokens is reduced by amount 
  // fulfilled.
I don't understand why OverConsumption is different to Failure. Both will result in throttling by the caller. The reason for the difference should be documented.

I'm also curious why burst is consumed, then budget. I would expect _budget_ to be consumed first (with refill) with overflow into burst? My expectation is for burst and budget to have different refill schedules in auto_replenish, so using burst first would result in more failures by missing refill opportunities.

I'm also curious why the size assertion is only on requests that result in refills. Is it guaranteed that size = max(one_time_burst + budget)? Why care about the difference? Is hogging the tokens "bad"?

Then, I'd work to remove the need for those comments - particularly the need to explain what happens on error - which should be "nothing", or that the bucket is emptied of whatever is available, not something in between.

Finally I'd remove the mutation of tokens, returning either unfulfilled or fulfilled in the return value.

[+] namelosw|4 years ago|reply
There are too many comments in there.

Comments are not the first tool people should try to reach because:

1. That's a lazy way to make the code clear, most code should be self-descriptive. If the teams are spamming comments it's likely to be a bad code base already.

2. There's no guarantee the comment is compatible with the code: It's natural language and there's no obvious way to test that. After versions of modification, the comments might still look like they make sense but that's not how the code work anymore.

Two alternatives:

1. Declarative programming: Try to leverage declarative programming styles more. For example, top-down style dynamic programming or memoization is always more readable and less error-prone than the bottom-up ones. You don't have to fully adopt functional programming or logic programming, but do take some inspiration from them.

2. Testing. Tests are like comments, they reflect some meaning of the code. Tests are not like comments, if the code breaks, they break.

Of course, comments are very valuable, but it's most valuable when applies in the right spots only.

[+] bandyaboot|4 years ago|reply
When I see code with a lot of inline comments, I can’t help but think that a lot of them would make sense to implement as trace log messages instead. If you’re going to write all these messages documenting the flow of the program, why not give yourself the ability to flip a switch and output them somewhere?