> Those who speak of “self-documenting code” are missing something big: the purpose of documentation is not just to describe how the system works today, but also how it will work in the future and across many versions. And so it’s equally important what’s not documented.
Documentation also (can) tell you why the code is a certain way. The code itself can only answer "what" and "how" questions.
The simplest case to show this is a function with two possible implementations, one simple but buggy in a subtle way and one more complicated but correct. If you don't explain in documentation (e.g. comments) why you went the more complicated route, someone might come along and "simplify" things to incorrectness, and the best case is they'll rediscover what you already knew in the first place, and fix their own mistake, wasting time in the process.
Some might claim unit tests will solve this, but I don't think that's true. All they can tell you is that something is wrong, they can't impart a solid understanding of why it is wrong. They're just a fail-safe.
Probably four times a year I find out that defending my bad decision in writing takes more energy than fixing it.
You start saying you did X because of Y, and Y is weird because of Z, and so X is the way it is because you can’t change Z... hold on. Why can’t I change Z? I can totally change Z.
Documentation is just the rubber duck trick, but in writing and without looking like a crazy person.
> If you don't explain in documentation (e.g. comments) why you went the more complicated route, someone might come along and "simplify" things to incorrectness, and the best case is they'll rediscover what you already knew in the first place, and fix their own mistake, wasting time in the process.
I've done this to myself. It sucks. Revisiting years old code is often like reading something someone else entirely wrote, and you can be tempted to think when looking at an overly complex solution that you were just confused when you wrote it and it's easily simplified (which can be true! We hopefully grow and become better as time goes on), instead of the fact that you're missing the extra complexity of the problem which is just out of sight.
Yes. Tests will solve this. Your point is perfect for tests.
If another experienced coder cannot comprehend from the tests why something is wrong, then improve the tests. Use any mix of literate programming, semantic names, domain driven design, test doubles, custom matchers, dependency injections, and the like.
If you can point to a specific example of your statement, i.e. a complex method that you feel can be explained in documentation yet not in tests, I'm happy to take a crack at writing the tests.
Peter Naur's "Programming as Theory Building" also addresses this topic of a "theory" which is built in tandem with a piece of software, in the minds of the programmers building it, without actually being a part of the software itself. Definitely worth a read: http://pages.cs.wisc.edu/~remzi/Naur.pdf
The biggest problem is when users of software, programmers of software, and the software code itself have 3 different incompatible theories of how it works.
Sometimes it gets worse still: you can have different theories according to (a) scientists doing basic research into physics or human perception/cognition, (b) computer science researchers inventing publishable papers/demos, (c) product managers or others making executive product decisions about what to implement, (d) low-level programmers doing the implementation, (e) user interface designers, (f) instructors and documentation authors, (h) marketers, (h) users of the software, and finally (i) the code itself.
Unless a critical proportion of the people in various stages of the process have a reasonable cross-disciplinary understanding and effective communication skills, models tend to diverge and software and its use go to shit.
Thanks for linking this, don't think I'd ever seen it before and as someone who's a second generation holder of a very large system's underlying theory it feels extremely accurate, but puts things into terms I never considered before.
I would love to see the pendulum swing back around to _good design_ again.
It matters more when designing libraries/frameworks than one-off apps.
Switching to a new framework/platform/language at the point the one you were on before finally matured enough that it was hard to ignore the need for good design doesn't actually help. you'll still be back eventually.
There have been articles over the last few years that have highlighted the dangers of idolizing and prioritizing innovation. I too hope for increased attention to craft and design of software in the future.
Good design is difficult. That's the easy to understand part.
What is difficult for me to understand, why is this skill just ignored. There are lots of skills that are difficult, but people still persist on learning them. Not for software design. It-works-somehow-for-now seems to look good enough for the most. This also results in "OOD is difficult, FP will save us. Oh no, FP does not really save us, FRP for sure will".
Sorry guys, you will need to break some eggs for omelette.
Seems to me there might be ways to program that convey more information. For example flow-based programming (FBP) seems like it might help and should help make the flow of the program explicit and obvious. That is, inherent to the code is a high level overview of what it does.
From my own limited experience it can make explaining a program to someone new almost trivial. You just use the various flows defined as almost visual guides to what is happening. I don't want to say FBP is a silver bullet, but I think it points to the idea that it is possible capture much more of the theory and design of the program in the code.
We’ve increased our productivity by quite a lot over a five year period by ditching most testing on smaller applications.
Basically our philosophy is this: a small system like a booking system which gets designed with service-design, and developed by one guy won’t really need to be altered much before it’s end of life.
We document how it interfaces with our other systems, and the as-is + to-be parts of the business that it changes, but beyond that we basically build it to become obsolete.
The reason behind this was actually IoT. We’ve installed sensors in things like trash cans to tell us when they are full. Roads to tell us when they are freezing. Pressure wells to tell us where we have a leak (saves us millions each year btw). And stuff like that.
When we were doing this, our approach was “how do we maintain these things?”. But the truth is, a municipal trash can has a shorter lifespan than the IoT censor, so we simply don’t maintain them.
This got us thinking about our small scale software, which is typically web-apps, because we can’t rightly install/manage 350 different programs on 7000 user PCs. Anyway, when we look at the lifespan of these, they don’t last more than a few years before their tech and often their entire purpose is obsolete. They very often only serve a single or maybe two or three purposes, so if they fail it’s blatantly obvious what went wrong.
So we’ve stopped worrying about things like automatic testing. It certainly makes sense on systems where “big” and “longevity” are things but it’s also time consuming.
This is the problem that I've run into trying to use formal methods.
I love them, I can express some things very concisely and even clearly. But there's no direct connection to the code and so keeping things synchronized (like keeping comments synchronized with code) is nigh impossible.
We need the details of these higher level models encoded in the language in a way that forces us to keep them synced. Type driven development seems like one possible route for this, and another is integrating the proof languages as is done with tools like Spark (for Ada).
This will reduce the speed of development, in some ways, but hopefully the improvement in reliability and the greater ability to communicate purpose of code along with the code will also improve maintainability and offset the initial lost time.
And by keeping it optional (or parts of it optional) you can choose (has to be a concious choice) to take on the technical debt of not including the proofs or details in your code (like people who choose to leave out various testing methodologies today).
I don't think it's possible (at least with today's technology). The high-level design/specification is intentionally vague; if it wasn't vague, we wouldn't need the low-level code, we could have a compiler generate it from the high-level specification.
As far as we can tell, the technology that can create a piece of exact code from a vague specification is called strong AI.
Heck, we don't even have a language to describe vague specifications without loss of fidelity. We don't know if such a language can exist.
I'm not sure that's possible in any truly meaningful way. Design is a very high level of abstraction that expresses a world, a particular view of that world with regards to a general set of problem domains, and a set of principles and theories about acting within that world. Code is a means (and not the only means) of achieving those actions.
This is not unlike the domains of philosophy, morality, ethics, and law. Attempting to express or enforce philosophy and morality via legalism is an exercise in futility, and even ethics which appears to be on the same level as law actually isn't since the presumption of ethics is behavior even in the absence of a law.
It seems kind of magical to be able to encode the intent of a program outside of its actual function. Theoretically this is what comments are for, but obviously those have zero enforcement value at the compiler level.
This is a lovely article. Software is a possibly a) errant and b) misinterpreted operational semantics of some other semantic horizons of contractual or implicit expectations. Knuth's Literate Programming was onto something. We inhabit a world of word problems and even faulty realizations of rarer formal specifications. Claims concerning "phenomena in the world" drive maintenance and enhancement regimens.
1 point by charlysl 21 minutes ago | edit | delete [-]
Wouldn't it be better to use data abstraction instead of abusing primitive types?
For instance dates are often abstracted as a Date type instead of directly manipulating a bare int or long, which can be used internally to encode a date.
So, age, which isn't an int conceptually (should age^3 be a legal operation on an age?), could be modelled with an Age type. This, on top of preventing nonsense operations, also allows automatic invariant checking (age > 0), and to encapsulate representation (for instance changing it from an int representing the current age to a date of birth).
surely. ASCII_A could be set incorrectly, or have a dumb type, and is more verbose anyway. By using the character directly, the code speaks its purpose.
>ASCII_A could be set incorrectly, or have a dumb type, and is more verbose anyway. By using the character directly, the code speaks its purpose.
I disagree. ASCII_A speaks it's purpose (we purposefully want an ASCII A stored here). And one can check the constant's definition, and immediately tell if it's correct. E.g.
In this strawman example, perhaps. However, code is usually surrounded by other code. So you could have the 'A' in multiple places. By using an explicit identifier you are protecting yourself against typos (depending on the language, it could be a compile-time error or at worst a very clear runtime error instead of a logic error). The other benefit of ASCII_A is that you are signalling that you are doing ASCII comparisons as opposed to using 'A' as a placeholder for a special value of 65 & thus be confusing the reader (e.g. some spec says 65 is some kind of magic value). Finally, by having an ASCII_A it provides you with the opportunity to add documentation explaining why this constant is the way it is (why not 'B'). The benefits scale with the number of instances (e.g. if that specific 'A' appears multiple times in a file, you wouldn't be able to document it in 1 spot).
Of course, all of this is likely overkill for your specific example. If I'm writing a to_hex routine, I'm not going to extract those constants as the context & commonplaceness of the algorithm makes it redundant. For the same reason that one might write i++ in a for loop instead of i += ONE. However, extracting inline constants to named variables is frequently something I look out for in code review, especially the more frequently the same constant appears in multiple places, the more difficulty a reader might have trying to understand why that value is the way it is (or if there's any discussion at all), or if it's a value that will potentially change over time. The negative drawbacks of extracting constants is typically minimal & with modern-day refactoring it's a very small ask of the contributor.
There's the theory that any hardcoded constant directly in code is bad idea. It may be used more than once, or used only once now, but in the future used more than once, or in the future the value may be changed and if it's used more than once, this is a source of issues.
Chaebixi|8 years ago
Documentation also (can) tell you why the code is a certain way. The code itself can only answer "what" and "how" questions.
The simplest case to show this is a function with two possible implementations, one simple but buggy in a subtle way and one more complicated but correct. If you don't explain in documentation (e.g. comments) why you went the more complicated route, someone might come along and "simplify" things to incorrectness, and the best case is they'll rediscover what you already knew in the first place, and fix their own mistake, wasting time in the process.
Some might claim unit tests will solve this, but I don't think that's true. All they can tell you is that something is wrong, they can't impart a solid understanding of why it is wrong. They're just a fail-safe.
hinkley|8 years ago
You start saying you did X because of Y, and Y is weird because of Z, and so X is the way it is because you can’t change Z... hold on. Why can’t I change Z? I can totally change Z.
Documentation is just the rubber duck trick, but in writing and without looking like a crazy person.
kbenson|8 years ago
I've done this to myself. It sucks. Revisiting years old code is often like reading something someone else entirely wrote, and you can be tempted to think when looking at an overly complex solution that you were just confused when you wrote it and it's easily simplified (which can be true! We hopefully grow and become better as time goes on), instead of the fact that you're missing the extra complexity of the problem which is just out of sight.
jph|8 years ago
Yes. Tests will solve this. Your point is perfect for tests.
If another experienced coder cannot comprehend from the tests why something is wrong, then improve the tests. Use any mix of literate programming, semantic names, domain driven design, test doubles, custom matchers, dependency injections, and the like.
If you can point to a specific example of your statement, i.e. a complex method that you feel can be explained in documentation yet not in tests, I'm happy to take a crack at writing the tests.
mannykannot|8 years ago
panic|8 years ago
jacobolus|8 years ago
Sometimes it gets worse still: you can have different theories according to (a) scientists doing basic research into physics or human perception/cognition, (b) computer science researchers inventing publishable papers/demos, (c) product managers or others making executive product decisions about what to implement, (d) low-level programmers doing the implementation, (e) user interface designers, (f) instructors and documentation authors, (h) marketers, (h) users of the software, and finally (i) the code itself.
Unless a critical proportion of the people in various stages of the process have a reasonable cross-disciplinary understanding and effective communication skills, models tend to diverge and software and its use go to shit.
btilly|8 years ago
runevault|8 years ago
jrochkind1|8 years ago
amelius|8 years ago
-- Linus Torvalds
jiggunjer|8 years ago
Isn't this definition circular, using "programmed" in defining "programming"?
jrochkind1|8 years ago
It matters more when designing libraries/frameworks than one-off apps.
Switching to a new framework/platform/language at the point the one you were on before finally matured enough that it was hard to ignore the need for good design doesn't actually help. you'll still be back eventually.
sekou|8 years ago
fooblitzky|8 years ago
"So you update the code, a test fails, and you think “'Oh. One of the details changed.'"
Some of the concerns they raise about writing tests are covered by Uncle Bob here: http://blog.cleancoder.com/uncle-bob/2017/10/03/TestContrava... and here: http://blog.cleancoder.com/uncle-bob/2016/03/19/GivingUpOnTD...
rimliu|8 years ago
eikenberry|8 years ago
From my own limited experience it can make explaining a program to someone new almost trivial. You just use the various flows defined as almost visual guides to what is happening. I don't want to say FBP is a silver bullet, but I think it points to the idea that it is possible capture much more of the theory and design of the program in the code.
eksemplar|8 years ago
Basically our philosophy is this: a small system like a booking system which gets designed with service-design, and developed by one guy won’t really need to be altered much before it’s end of life.
We document how it interfaces with our other systems, and the as-is + to-be parts of the business that it changes, but beyond that we basically build it to become obsolete.
The reason behind this was actually IoT. We’ve installed sensors in things like trash cans to tell us when they are full. Roads to tell us when they are freezing. Pressure wells to tell us where we have a leak (saves us millions each year btw). And stuff like that.
When we were doing this, our approach was “how do we maintain these things?”. But the truth is, a municipal trash can has a shorter lifespan than the IoT censor, so we simply don’t maintain them.
This got us thinking about our small scale software, which is typically web-apps, because we can’t rightly install/manage 350 different programs on 7000 user PCs. Anyway, when we look at the lifespan of these, they don’t last more than a few years before their tech and often their entire purpose is obsolete. They very often only serve a single or maybe two or three purposes, so if they fail it’s blatantly obvious what went wrong.
So we’ve stopped worrying about things like automatic testing. It certainly makes sense on systems where “big” and “longevity” are things but it’s also time consuming.
mpweiher|8 years ago
And that's the problem. We need ways to make those higher level designs (~architecture) code.
Jtsummers|8 years ago
I love them, I can express some things very concisely and even clearly. But there's no direct connection to the code and so keeping things synchronized (like keeping comments synchronized with code) is nigh impossible.
We need the details of these higher level models encoded in the language in a way that forces us to keep them synced. Type driven development seems like one possible route for this, and another is integrating the proof languages as is done with tools like Spark (for Ada).
This will reduce the speed of development, in some ways, but hopefully the improvement in reliability and the greater ability to communicate purpose of code along with the code will also improve maintainability and offset the initial lost time.
And by keeping it optional (or parts of it optional) you can choose (has to be a concious choice) to take on the technical debt of not including the proofs or details in your code (like people who choose to leave out various testing methodologies today).
borplk|8 years ago
> We need model based editing environments that will allow us to have a much richer set of software building blocks.
https://news.ycombinator.com/item?id=16117668
js8|8 years ago
As far as we can tell, the technology that can create a piece of exact code from a vague specification is called strong AI.
Heck, we don't even have a language to describe vague specifications without loss of fidelity. We don't know if such a language can exist.
fallous|8 years ago
This is not unlike the domains of philosophy, morality, ethics, and law. Attempting to express or enforce philosophy and morality via legalism is an exercise in futility, and even ethics which appears to be on the same level as law actually isn't since the presumption of ethics is behavior even in the absence of a law.
jandrese|8 years ago
maxerickson|8 years ago
http://www.vpri.org/
crdoconnor|8 years ago
Chiba-City|8 years ago
hinkley|8 years ago
How do you get the product you want when you don’t know what you want?
charlysl|8 years ago
Wouldn't it be better to use data abstraction instead of abusing primitive types?
For instance dates are often abstracted as a Date type instead of directly manipulating a bare int or long, which can be used internally to encode a date.
So, age, which isn't an int conceptually (should age^3 be a legal operation on an age?), could be modelled with an Age type. This, on top of preventing nonsense operations, also allows automatic invariant checking (age > 0), and to encapsulate representation (for instance changing it from an int representing the current age to a date of birth).
unknown|8 years ago
[deleted]
robotresearcher|8 years ago
Would be better than
return x >= ASCII_A;
surely. ASCII_A could be set incorrectly, or have a dumb type, and is more verbose anyway. By using the character directly, the code speaks its purpose.
coldtea|8 years ago
I disagree. ASCII_A speaks it's purpose (we purposefully want an ASCII A stored here). And one can check the constant's definition, and immediately tell if it's correct. E.g.
So: tell us the intention of the code's author.Whereas:
only tells us what the code does, which might nor might not be correct (and we have no way of knowing, without some other documentation).So, by those two lines:
We know what the code is meant to do, AND that it does it wrongly (and thus, we know what to fix).These line, on the other hand:
tells us nothing. Should it be 'A'? Should it be something else? We don't know.vlovich123|8 years ago
Of course, all of this is likely overkill for your specific example. If I'm writing a to_hex routine, I'm not going to extract those constants as the context & commonplaceness of the algorithm makes it redundant. For the same reason that one might write i++ in a for loop instead of i += ONE. However, extracting inline constants to named variables is frequently something I look out for in code review, especially the more frequently the same constant appears in multiple places, the more difficulty a reader might have trying to understand why that value is the way it is (or if there's any discussion at all), or if it's a value that will potentially change over time. The negative drawbacks of extracting constants is typically minimal & with modern-day refactoring it's a very small ask of the contributor.
buckminster|8 years ago
> ASCII_A (usually spelled just 'A')
Of course, they are not the same thing. In the last 6 months I've worked on a very old system that uses not-quite-ASCII. 'A' was 65 but '#' wasn't 35.
unknown|8 years ago
[deleted]
stevenwoo|8 years ago
pkamb|8 years ago
unknown|8 years ago
[deleted]
unknown|8 years ago
[deleted]
moolcool|8 years ago
unknown|8 years ago
[deleted]
bringtheaction|8 years ago
erpellan|8 years ago
Fatal mistake? Really? An unrecoverable failure?
So, none of the software I've written in the last decade worked, despite all evidence to the contrary?
Right.