top | item 18772873

Please do not attempt to simplify this code

1552 points| whalesalad | 7 years ago |github.com

627 comments

order
[+] Klathmon|7 years ago|reply
I love this! It's the "jazz music" of software development. Something which breaks all the "rules" but does so purposefully and explicitly so that it can become better than the "rules" allow.

A naive look at this and my head is screaming that this file is way too big, has way too many branches and nested if statements, has a lot of "pointless comments" that just describe what the line or few lines around it is doing, and has a lot of "logic" in the comments which could quickly become outdated or wrong compared to the actual code.

Yet at the same time, it's probably a hell of a lot easier to maintain and manage than splitting the logic up among tens or hundreds of files, it contains a lot of the inherently complex work it's doing to this file, and it is so well and heavily commented that it should be pretty easy to ensure that any changes also keep the comments up to date (after all, any change without changing the resulting comments should show up like a sore thumb, and will most likely prompt the reviewer to look into it at the very least).

[+] pdkl95|7 years ago|reply
Ignoring the initial boilerplate (license, imports) and the request to preserve the verbose ("space shuttle") style, the first line is:

    // Design:
    //
    // [... 4 paragraphs of English prose
    //      explaining goals and intent... ]
That's exactly the type of comment that should be at the beginning of most files!
[+] nickharr|7 years ago|reply
Having spent 25+ years writing, viewing, commenting on and reviewing code in a multitude of languages, this is good stuff to see - regardless of the 'style' of programming (or the language broadly-speaking).

Stepping back and whilst we can all overlook it, good code comments can make an enormous difference in productivity - both for an individual, a team and indeed a business. It aids repository knowledge (something that is easily lost between current and prior teams/individuals), which shouldn't be mistaken for the intelligence of someone looking at code...

I've spent far too much time personally and otherwise attempting to reverse-engineer code written by someone with no comments or explanation. At times, super experienced programmers/developers will take performance-enhancing shortcuts that the less-experienced don't understand; They'll compress routines and functions that are a result of their explicit knowledge of a language and/or domain but without explanation...

On a basic level, comments should: inform, educate, outline and help others understand the sometimes complex routines and functions that we all create and often under an enormous amount of pressure.

There are those that believe good code shouldn't need explanation and to some degree that's true, but you can't apply that brush to every codebase. Code can become complex, awkward, spaghetti-like and almost unfathomable at times.

I've always strived to teach less experienced developers to comment well, efficiently and with a little humor/humour (where possible); Something that allows us to understand code quickly, appreciate the efforts of those before us and smile/grin at the complexity of a challenge.

Personally, I don't really care about the code/comment ratio - It's a complete red herring. At times, code comments can be worth more than the code itself. At other times, they just help you get your job done; quickly, efficiently, no fuss, just great code.

[+] lmm|7 years ago|reply
> There are those that believe good code shouldn't need explanation and to some degree that's true, but you can't apply that brush to every codebase. Code can become complex, awkward, spaghetti-like and almost unfathomable at times.

I have yet to find a codebase that couldn't be made clear as soon as a programmer actually put some effort into doing so. Far too often adding a comment is used as an excuse to give up on making the code readable before you've even started.

[+] kyberias|7 years ago|reply
I think you're giving very bad advice.

> On a basic level, comments should: inform, educate, outline and help others understand the sometimes complex routines and functions that we all create and often under an enormous amount of pressure.

Take the time to simplify the complex routines, clean them up and make them readable and don't waste your precious time in writing "good comments".

> Code can become complex, awkward, spaghetti-like and almost unfathomable at times.

The solution for this is not "comment more". It's "clean up the mess".

> At times, code comments can be worth more than the code itself.

That doesn't make any sense. If the code has no value, you can remove it.

> I've always strived to teach less experienced developers to comment well, efficiently and with a little humor/humour (where possible); Something that allows us to understand code quickly, appreciate the efforts of those before us and smile/grin at the complexity of a challenge.

Please don't do that. Of course it's your code base, but usually it's best to find better venues to express your humor than code comments. Also: teach the junior developers to write clean code first.

Comments too often are used as a bad device to fix code. A developer first writes a horrible mess of code and stops. He may then realize that perhaps no-one will understand it. But if we now teach them not to clean up the code but rather just "write some funny comments", do you think it's any better?

I've seen too many code bases written with this attitude. The comments usually don't help at all, they distract the reader and they distract the author. It's too often useless noise. Many developers hide the comments from these code bases so that they can concentrate on what ACTUALLY is relevant: the code.

[+] asadkn|7 years ago|reply
Agreed.

One of my pet peeves is JS projects. I am not sure why but the front-end developers refuse to add any comments at all. This trend, oddly enough, started with ES6. Pre-ES6, JSDoc style comments at least, were quite common.

Just because some popular Javascript project doesn't have comments doesn't mean yours shouldn't. Straight-forward code may not need comments but most of these projects definitely need to explain why or how are things supposed to work or why things are done a certain way.

[+] dajonker|7 years ago|reply
"it became clear that we needed to ensure that every single condition was handled and accounted for in the code"

This is a feature of several (mostly functional) programming languages, e.g. Haskell. Fun to see that often people figure out that these types of concepts are a smart way to write your code. Too bad it usually means many people reinvent the wheel instead of learning about computer science history and other languages.

[+] spullara|7 years ago|reply
My take away from reading this code is that it is a huge mess that may be impossible to clean up. At some point they failed to introduce abstractions that would remove the need for all this complexity. They are probably right that now that it works that it will be hard to refactor it without leaving out some critical case. However, I pity anyone that works on this code base.
[+] _ph_|7 years ago|reply
As much as I like the style where the code logic is commented very thoroughly, it also falls in the trap of comments no longer matching the current code, I assume some variable renaming happened. In the function (line 320 of https://github.com/kubernetes/kubernetes/blob/ec2e767e593953...)

   func (ctrl *PersistentVolumeController) syncUnboundClaim(claim *v1.PersistentVolumeClaim) error {
the variable "pvc" seems to have been renamed into "claim" and "pv" into "volume", judging from the code/comment mismatch. Comments in the lines 339, 358, 360, 370, 380, 395, 411, 422, 427 point to the old names. Furthermore in line 370 the comment reads:

   } else /* pvc.Spec.VolumeName != nil */ {
while the matching if is:

  if claim.Spec.VolumeName == "" {
So not only the variable name mismatches, but also the comment is wrong. The VolumeName seems to be a string, so is never nil, the else comment should specify that the VolumeName is non-empty.

The more verbose and detailled the comments are - the more work needs to be spent in ensuring that they are correct.

[+] boramalper|7 years ago|reply
> Space shuttle style is meant to ensure that every branch and condition is considered and accounted for…

FTFY: …hopefully!

Only if they’ve used a language with Algebraic Data Types support, the compiler would enforce that “every branch and condition is considered and accounted for.” The only PL with ADT that I’ve used was Haskell, but I’ve heard that Rust has them too “enums”.

People are arguing that “code is what computer executes, comments don’t ensure anything!” and so on, but besides being executed, (this Go) code does not ensure anything either. It’s human eyes that skim through all the cases, and look for a matching branch for every one of them that “ensures.”

In my humble opinion, this so-called “space shuttle style” is just one of the many workarounds to deal with Go’s by-design limitations (the most famous one being lack of generics), a language that’s designed only 9 years ago.

[+] hannofcart|7 years ago|reply
I see a lot of comments mentioning various versions of the following:

- "It's the "jazz music" of software development."

- "...breaks all the "rules" but does so purposefully..."

- "this is irreducibly complex, and cannot be split into multiple files"

- "that smallness-of-file or smallness-of-function is not a target to shoot for"

I am wondering: can't all the above statements be said in defence of any poorly engineered, gargantuan single page code?

[+] thothamon|7 years ago|reply
This code reminds me of why ML-like languages with Maybe-style types and case expressions that generate compiler errors for missed alternatives are good. I bet rewriting this in OCaml or Haskell would lead to tighter code and might even unearth a couple possible states that haven't been accounted for.
[+] cultofmetatron|7 years ago|reply
This is basically the style that rust sort of forces you to write in with allowing shorter forms where the compiler can automate checking that all error states are accounted for.
[+] startupdiscuss|7 years ago|reply
The comment:code ratio is higher than anything I write or that I’ve seen.

However, it does give me some comfort. When it’s not gamed, do other HNers also feel that a high comment:code ratio probably indicates quality?

There are reasons why this may be the case. (More thought, more time and a large team etc)

I don’t advocate using this measure to reward anyone because it would be gamed immediately.

[+] existencebox|7 years ago|reply
I think it's situationally useful. If I take the author of the OP code at their word, this is one of those situations.

Core, critical plumbing/logic at the kernel of business critical, long-lived applications, will be the source of my stress-dreams long into the twilight years of my life; in the form of a lack of documentation and a presence of organic growth.

To criticize myself quite bluntly: If the core code I worked on at work looked like this, I'd feel a great deal more comfortable in some of the changes/digging that inevitably arises.

I would never use it as an absolute metric; but I'd use the level of comfort e.g. a new dev feels when looking at something that might otherwise be a spiderweb and saying "Oh this makes sense" (As I do when looking at OP) as a north star for the most sensitive bits of logic.

[+] cbanek|7 years ago|reply
Having written safety critical code (and reviewed it) it is very useful.

On the other hand, as soon as someone not safety minded gets their hands on it, trouble happens. Comments aren't updated (and there's no way to make sure they are checked, other than GREAT code reviews by the original authors, usually with at least two or three people doing critical reviews). Then the comments can become misleading and a liability as people will take them for truth, as they should.

If you have code that has a lot of subtle dependencies or edge cases, really great comments can help enormously.

[+] geofft|7 years ago|reply
I generally find that a high comment to code ratio, if the comments are of the form "Do this thing because of this reason", indicates quality. It indicates the programmer both knows what they wrote and why they wrote it, and it helps future maintainers figure out under what circumstances the code can be changed, refactored, or removed and under what circumstances the original behavior must be kept.

A high comment to code ratio, where the comments are of the form "Do this thing in this way," indicates a lack of quality - generally a sign that the programmer is not confident enough in the language that they're writing in, and is trying to solve language-level problems instead of business-level problems.

Uncommented code better come with some reference for why the code exists in the form it does. Sometimes commit logs and the VCS "annotate"/"blame" feature works. Sometimes commit logs link to bug trackers or feature requests. Sometimes there's a README. If you don't have any of those, I tend to find that it's generally low-quality code.

Our purpose is to deliver business value. (Or non-business value, as the case may be; if you're writing a free video game for fun, you want people to successfully have fun.) Our purpose is not to generate lines of code. All code is, to some extent, legacy code; comments can help it be manageable legacy code, or make it even more unmanageable.

[+] jcranmer|7 years ago|reply
The way I think about comments is that you should always be able to articulate what the consequences of deleting any line of code is. If the code itself is insufficient to do that, it needs a comment.

There are three kinds of comments: why, what, and how. How comments are almost always a sign that the design is poor or the complexity is too clever. Why comments are necessary to understand the code and are almost always a good thing. What comments can be useful guideposts for skimming code, but they are also extremely prone to code rot. I suspect what comments generally end up being neutral in a net value proposition.

You want a high ratio of why comments to code, but I suspect most high comment-to-code ratios arise from what comments, which severely attenuates the utility of a pure comment-to-code ratio.

[+] brandall10|7 years ago|reply
The comment:code ratio is similar to some legacy enterprise C/C++ systems I've worked on.

I've been on Rails/React teams where comments were seen seen as a possible smell. Not talking about useless literal comments, just that their need was seen as pointing to possible bad design and that a well factored codebase was self-documenting -- ie. if you had to comment something, perhaps methods/vars were poorly named, SOLID principles were not adhered to, methods needed to be broken out, or it was just a sloppy approach. Even explaining design decisions was considered more in the domain of git messages and having nicely packaged atomic commits.

While I see that aspect of it, there's no getting around the constraints of the real world and that some problems are just difficult and much easier to grok with a user guide in plain english, so to speak. And mission critical stuff needs as many safeguards as possible.

That said, inaccurate comments can be dangerous and when your code is highly commented there is real danger things can get out of sync. If you're working on a 5000 line file that 100 developers have touched over a 20 year period... and no one has taken it upon themselves to do a recent comment audit, there be dragons.

[+] rasengan|7 years ago|reply
I comment to myself before I write code. It’s in English. Then I write code. So every line of code is commented by default.

I think this provides me higher quality, less bug ridden results. So if others use comments in this style I would tend to believe it does increase code quality.

If a line of code doesn’t match the comment, something is clearly wrong. ;)

[+] dllthomas|7 years ago|reply
Maybe a loose correlation? Highly commented code was probably not written under tremendous time pressure; uncommented code can go either way. Wrongly commented code is painful, though.

And then there's something I recall running into, a decade ago:

    using namespace std; // using namespace standard
[+] bdibs|7 years ago|reply
I actually would say it’s almost the opposite, if you’re writing clean, expressive code it shouldn’t need explaining.

And if your code is clean, you shouldn’t have a bunch of redundant comments explaining the obvious.

[+] namuol|7 years ago|reply
I use comments to document assumptions that are likely to be wrong, either now as I write it, or later when someone (probably myself) changes it.

It is absolutely useful to do, and really not too difficult.

Languages that allow for more formal assumption-checking (especially before runtime) are even better, but comments have an additional benefit of being understood by a human directly.

I wish languages with static analysis could somehow allow authors to encode human-friendly/sematic errors that you often see as runtime exceptions into the static analysis itself...

[+] josteink|7 years ago|reply
> do other HNers also feel that a high comment:code ratio probably indicates quality?

I consider it a big risk of errors.

When some code is changed, will all related comments be rewritten too? I doubt it.

And then you end up with a codebase which indicate A but comments which clearly spell out B, and you as a maintainer have no idea what to believe.

DRY. Don’t repeat yourself. The comments should not double up for the code. That’s just future maintenance nightmare.

[+] sl1ck731|7 years ago|reply
I think it indicates pride more than anything. If I write something that is just business as usual or commonplace my comments are pretty lacking. If it is something very interesting or that I am proud to have done, I usually write some very detailed comments. This probably correlates to better quality just because it was something I was interested in doing rather than shoveling code.
[+] nerdponx|7 years ago|reply
Knuth allegedly attributes the stability of TeX to his literate programming style.
[+] twunde|7 years ago|reply
It's really only useful in areas of codebases that either a) are very complex, b) touched by many people or c) both. When that happens, everyone prefers that there is a lot of documentation, especially about the why. With older codebases the question is always whether this is an actual bug from the developer or is there a reason why it's doing this super-weird thing and if so is it still applicable. What's happened over the last 5 years is that automated testing has become so mainstream that places without tests are the exception AND the tests have replaced the need for comments.
[+] kakarot|7 years ago|reply
Sometimes it's good to have a block comment explaining the motivation behind a chunk of code or particular line, or just to clarify the individual steps in small modules, but you have to remember you aren't writing prose.

I think around 10% of your code as comments is a good measure, but also remember that you may not revisit a module for years, and you will come to appreciate each and every breadcrumb you left which leads back to your original state of mind when you wrote it. If you measure code quality as maintainability, then comments can indeed increase code quality, just non-linearly.

[+] spyspy|7 years ago|reply
Long comments tend to scare me because they often detail some horrific hack that I'm going to have to deal with.
[+] beokop|7 years ago|reply
Nothing to do with ratio but I’ve found that the quality of the comments often reflect the quality of the code.
[+] protonimitate|7 years ago|reply
>However, it does give me some comfort. When it’s not gamed, do other HNers also feel that a high comment:code ratio probably indicates quality?

I think people should spend more time commenting/documenting across the board. I'd much rather have verbose commenting that is unhelpful that I can skip, versus minimal commenting and code that is overly optimized and hard to parse. The less thinking I have to do to pick up where the last person left off, the better.

I would say that yes, in general high comment:code ratio tends to be higher quality.

[+] emn13|7 years ago|reply
I think it's usually a sign of a poorly understood domain or a poorly modeled problem. It's not a good sign; it's not necessarily a bad sign; it's (at best) an admission of one's limitations.

Comments become useful when behavior is implicitly tricky. Ideally you'd make the "trickiness" tangible and expressible in-whatever-language you're in, but that's not always easy to do.

[+] kansface|7 years ago|reply
> When it’s not gamed, do other HNers also feel that a high comment:code ratio probably indicates quality?

Just the opposite. It typically indicates a history of rigorously documenting terrible code. Sometimes, comments come from complex business requirements or other external constraint. Documenting the former is largely an anti-pattern while documenting the later is hugely useful.

[+] topmonk|7 years ago|reply
I think repeating the same concept in different ways just makes things harder to read, not easier. Comments also sometimes reference code outside of where they are placed. This leads to them becoming misleading and incorrect.

I find that comments can be a last ditch effort to make hacky convoluted code look better than it is. It can be an indication of lack of thought and planning and later obsessive documentation to make up for it.

[+] AlexCoventry|7 years ago|reply
> do other HNers also feel that a high comment:code ratio probably indicates quality?

That's always been my instinct.

[+] Karrot_Kream|7 years ago|reply
There's a lot of talk about comments becoming stale and code being self documenting in the replies which makes me wonder: do people genuinely not read comments and just made code changes without updating comments? And do reviewers not look at the context of the surrounding code and just let commits in? What's the point of having code reviews then?
[+] Jach|7 years ago|reply
The logical conclusion to more comments is Literate Programming. This book for instance is also a program: http://www.pbr-book.org/3ed-2018/contents.html

This file isn't that heavily commented. Do you look at many OSS projects for comparison? Though when things get complicated with many branches and function reentries it makes me wonder whether the problem would have been better solved with declarative logic that handles the procedural mess for you. (It might also be much higher quality since you may unlock access to various formal methods and go beyond unit tests. Though perhaps for example there's a vetted TLA+ spec not shown that this controller is based on.)

I don't think doc'ing every function is unusual, usefully doing so is less common though. Comments in the function body also aren't that rare, though it might indicate a place for better factoring e.g. just more function calls on descriptive/suggestive names. (Having more functions will help in not having to stub out (and deeply stub) so much in a behavioral test, too, since you can get away with just mocking the function call instead of the potentially hairy state logic the function does underneath.)

I see an example at a random spot for a couple improvements in naming (in my ignorant opinion, I don't know about kubernetes) -- though the fact I feel able to express even a weak opinion on an improvement suggests the comments were reasonable. I've seen code less hairy but with no comments or useful tests and without a need to really understand it I just want to move along pretending I saw nothing.

Look at the set of ifs at L591. The first if is a null check with part of the explanation on L592, better to remove that part and have a function call, something like "claimWasDeleted(claim)". The matching else if on 615 checks for an empty string name, I'm not sure but I think its explanation is at L634 and the empty string check could be "isClaimPending(claim)", and maybe move the mode mismatch check to its own else if before the isClaimPending block and give it a better name. I appreciate the comment on L635 telling me why the next line of code on 641 is done (it may likely not be clear from the commit history, which can be another place for whys) though with the isClaimPending change the comment and code might be replaced with a fn call with the details in the fn doc. I'm also reminded of an idea in more expressive languages to annotate purely optimization metadata of any kind (inlining being the simplest) and being able to toggle it on/off for extra QA in a test suite. Anyway the next elif on 643 and its comment, could be something like "isVolumeBoundToClaimProperly(claim, volume)". You get the idea.

[+] lmm|7 years ago|reply
I think it's likely to indicate low quality. Comments are for where the code wasn't clear enough.
[+] zorga|7 years ago|reply
> do other HNers also feel that a high comment:code ratio probably indicates quality?

Nope, imho code with lots of comments is generally crappy code. It's littered with comments to explain the sloppy code they couldn't make clear because they're bad programmers. Good programmers use few comments, write simple clear code that doesn't require explaining, and leave comments about why something was done rather than simply trying to explain what the code does.

Code never lies, comments often do; don't trust comments that explain the code.

[+] newnewpdro|7 years ago|reply
No, I do not find it indicates quality.

To me, comments are noise, and code is signal; the code is what actually executes.

It's one thing to have a summary of intent at the start of a listing, that should not count towards the code:comments ratio.

Once the code begins however, there should be a minimum of comments necessary - especially in a high-level language not constrained to assembly-level instructions.

In assembly listings it was common to have two columns, the code on the left and comments which often resembled high-level pseudo code on the right. Here's some representative apollo guidance computer source:

  MAKEPRIO	CAF	ZERO
		TS	COPINDEX

		TC	LINUSCHR
		TCF	HIPRIO		# LINUS RETURN
		CA	FLAGWRD4
		MASK	OCT20100	# IS PRIO IN ENDIDLE OR BUSY
		CCS	A
                TCF	PRIOBORT	# YES, ABORT

When you're already working in a high-level language like C or Golang, you should be able to clearly communicate what is going on without the need for littering it all with comments.
[+] apo|7 years ago|reply
One simplification that might be tempting is to replace the "if !condition" with "if condition".

For example, line 463 shows:

  if !found {
    // handle missing
  } else {
    // handle found
  }
I would simplify this to:

  if found {
    // handle found
  } else {
    // handle not found
  }
Or even:

  if missing {
    // handle missing
  } else {
    // handle not missing
  }
The test-negative style is repeated throughout the file, but inconsistently. Sometimes the negative is tested in the if branch, sometimes it's tested in the else branch. Why?
[+] yongjik|7 years ago|reply
Obviously I'm not the intended audience, but I'm not sure it's wise to have CloudVolumeCreatedForClaimNamespaceTag, CloudVolumeCreatedForClaimNameTag, and CloudVolumeCreatedForVolumeNameTag in the same file.

This is almost the worst of both worlds: the trouble of wading through a pile of words, together with the lack of clarity.

[+] unethical_ban|7 years ago|reply
I can't say I've written overly branchy code, but I have become a fan of leaning on the side of "verbose" regarding comments in code - specifically in nested if statements. So many in the Ops area, "just" writing scripts for themselves ( that the rest of the team starts depending on) will just pound out a script that works, with no notes on its function, design or requirements.

15 months later, they moved to a less stressful team, the app is not functioning because the servers got upgraded, and you wonder how the hell this thing works. And if it doesn't work, you have 60 hours of redevelopment to do.

[+] Xyik|7 years ago|reply
Love this thread. I see this a lot, where engineers blindly follow best practices and have urges to re-factor code when its not necessary. Big files are not necessarily bad and I love that a lot of the comments are with me on this. Having to open several tabs and remembering where you are in the stack can be hard once there are more than a couple of frames / function calls in. There is a lot of benefit to keeping logic in 1 file or 1 function, and there is a time and a place for writing really granular DRY code. As with all engineering, there are always trade-offs to every decision and I think its about time we put to rest some of the traditional rules of thumbs and 'code smells' new engineers learn and adhere to like a bible.
[+] germainelong|7 years ago|reply
The comment claim that every branch is accounted for and yet few functions below you can see this is certainly not the case. They should either have fixed it first and then make such comment or shouldn't make such comment at all. Otherwise this looks a bit cringey.
[+] nradov|7 years ago|reply
For background on how actual Space Shuttle flight control software was written I recommend reading the "NASA Manager's Handbook for Software Development". It contains some great guidelines for writing safety-critical software. AFAIK, no Space Shuttle mission ever suffered a serious safety incident due to a software defect.

https://ntrs.nasa.gov/search.jsp?R=19910006460

[+] emmanueloga_|7 years ago|reply
This looks like an example of why I'm trying to learn more about model driven development these days.

There's probably a lot of wisdom in this file embedded in a lot of noisy Go code. If someone tries to implement this code in a different programming language, a lot of this wisdom would have to be painfully extracted from the Go code. Some sort of extracted decision table format (examples [1]) would be objectively easier to read, could be rendered in different formats, hyperlinked, etc. Maybe a small subsection of this code could be generated from some structured data specification.

Mbeddr [2] is an example of a language that allows embedding decision tables directly in C99. It can also embed state machines and other goodies directly into the code. This is just one way to go... I'm sure some ppl would complain about lock-in in a specific IDE.

It is not clear to me exactly which domain driven techniques could be applied to this particular piece of Go code, but it could be worth it to find that out.

From what I see out there model driven design seems to be applied to areas were really convoluted, and some times nonsensical logic needs to be mapped to code, like in insurance policy management, and stuff like that, or for other complicated code like firmware, drivers, protocols, etc. Why can't we apply these techniques to general programming problems?

1: https://medium.com/@markusvoelter/the-evolution-of-decision-...

2: http://mbeddr.com/index.html

[+] maxander|7 years ago|reply
Literate programming [0] is frequently re-invented, in various forms, because it is simply the right way to do things- and each iteration of it is almost immediately discarded, because we're all sinful and impatient creatures.

[0] https://en.wikipedia.org/wiki/Literate_programming