The Code Documentation Fallacy

[+] exch|12 years ago|reply

I think the article hits on the wrong conclusion. Don't write less documentation because you're going to assume it will all suck anyway. Insist on writing better documentation instead.

Having said that, I find it helpful to write documentation before writing the actual code. Specifically for more complex code pieces for which the behaviour is not immediately obvious.

For me, writing documentation serves as a form of 'rubber duck debugging'[1] before the actual bugs occur. Explicitly writing out the intention of a piece of code in plain English often makes the concept much clearer in my brain and immediately brings out possible problems with my initial design. Problems I can fix before wasting time iterating through code implementations.

This is also the reason I very much enjoy writing thorough READMEs for each library I produce. These explain in abstract concepts what the entire library API is intended to accomplish. Additionally, I try to include actual usage examples. As with code-level documentation, this brings up possible problems before they occur.

The fact that it makes it clear what the code does, months after I last worked on it, is entirely bonus.

[1]: http://en.wikipedia.org/wiki/Rubber_duck_debugging

[+] NickPollard|12 years ago|reply

The best kind of documentation is that which is checked by the compiler and guaranteed to be correct - the code itself. Code that is well written, in good style using sensible variable names, can be as descriptive as good comments. Using a good static type system allows you to encode properties into your code that are guaranteed to be valid.

I certainly think some comments have their uses - but these are generally at the level of how systems and modules work, and the concepts used therein, as discussed elsewhere in these comments. I agree with the article that only 5% (or less) of functions need individual comments attached.

[+] mapgrep|12 years ago|reply

Thank you for saying that, I sometimes feel like I'm taking crazy pills when I hear some of the inane arguments against thorough documentation.

I was hoping this was going to be about the real "documentation fallacy:" 'Documentation tends to be of low quality, therefore it is best to avoid writing much documentation.' One common instantiation of this is "thorough documentation is bad because it will inevitably fall behind the code and be inaccurate."

People fall into the trap of assuming there is something inevitable about bad docs. Yet they never assume there is anything inevitable about bad code, even though most of the code in the world is, objectively, complete shit!

[+] araes|12 years ago|reply

I won't argue the less vs more, that's an age old debate. However, this part:

"Explicitly writing out the intention of a piece of code in plain English often makes the concept much clearer in my brain"

I think is critical for me. I actually write code by first doing a pseudo-code pass of comments, where I just write the flow of what I think the code should be doing. Then go back and fill in the actual functionality behind the comments. Naturally, its not always perfect on the first pass, but you just mod the comment thought process to update your approach, and then refill the functionality. As a programmer, you can then skim down through sections just checking what its "supposed" to do, whether you're a newbie diving in, or the original writer who's just needing a refresh.

[+] jmilloy|12 years ago|reply

>Having said that, I find it helpful to write documentation before writing the actual code.

Do you write/have a technical spec? I think that's what you are describing. If I first and only write the comment, there can often be a disconnect between what the code does right now and what it would ideally do once I'm finished. On the other hand, a spec plus an accurate comment keeps everything in order.

[+] ballard|12 years ago|reply

Yup. Literate programming + clear not clever + less is more (doc and code must all serve a purpose.)

[+] chris_mahan|12 years ago|reply

This of course assumes you have enough time to do this.

[+] thejteam|12 years ago|reply

My problem with code documentation is that documentation is done at the function or class level. When I'm looking at new code I would prefer a "concept of operations" describing how the whole thing works together rather than piecemeal function documentation.

This is especially important with open source code. I'm not going to donate my time to working with an existing codebase if it is going to take hours to figure out how it all pieces together. Examples are fine but what I really need to know is the why. At least when I put up with this at work I'm getting paid by the hour.

[+] jostylr|12 years ago|reply

The old classic literate programming paradigm helps with this. It allows one to write documentation that gives you an overview, provides you with whatever ordering and connections you feel appropriate and, with my implementation of it, even takes care of most of the tool chain. https://npmjs.org/package/literate-programming

[+] ZoFreX|12 years ago|reply

+1 to this, whether I'm making a quick fix or intend to do some significant work, if I'm diving into an unfamiliar project I am always overjoyed if I find 'concepts and metaphors' documentation.

[+] RivieraKid|12 years ago|reply

I opened the comments with an intention to write basically the same thing.

Higher level documentation - classes, packages, groups of packages - makes the project much more approachable. It answers the question "Here's a 3 levels deep hierarchy - where do I start, how are the pieces connected to each other?"

Documenting classes is fairly common, but only the public API. I'm not only interested in how to use the class, but also how does it work internally, what is the inner architecture.

[+] rpdillon|12 years ago|reply

Java actually does quite well with this, but I rarely see it used in the wild: package documentation. http://www.oracle.com/technetwork/java/javase/documentation/...

[+] drone|12 years ago|reply

This is one reason why I really like Doxygen, in the areas where it's supported. It's not only very easy to generate easy-to-read documentation for the code while writing it (specialized comments, with operators to call out special meanings and so forth), but it's also very easy to create high-level documentation from exactly the parts that contribute to it. (Section, page, etc. commands.) Not to mention, then generate a nice, easy HTML stack of it all with pages, search-ability, etc.

I've never really agreed with the whole "code should be just be obvious when read." The problem with any large set of instructions is that both the instruction, order, combinations, and other artifacts reflect the experience, background, and environment of the author. Two developers of largely equivalent experience and talent rarely come up with the same set of instructions for the same task.

Consider if I told you how an engine functioned as a means of telling you how to change a head gasket.

[+] abritishguy|12 years ago|reply

The documentation for Flask does this and it is really good.

[+] unknown|12 years ago|reply

[deleted]

[+] azov|12 years ago|reply

> When some coder changes the function, it is very easy to forget to update the comments

It isn't.

All public APIs should have documentation, even if you believe it's obvious what they do. This documentation never goes out of date because once you release your API it tells you what you cannot change. If you changed the code so that your documentation is now wrong - this code change is a bug and you should fix it. Because there's other code in the wild that relies on the behavior that you promised.

Of course in practice you do have to change that behavior every once in a while. But this should be a big deal (that usually includes bumping up version numbers, mentioning it in release notes, etc). If you're changing it so often that updating the damn comment is an issue you either document implementation details that don't belong in API documentation or your API is unstable crap and nobody should be using it.

PS. Complaining that API documentation gets out of sync with the code is like complaining that unit tests break when you change the code. Duh - that's what they are there for!

[+] einhverfr|12 years ago|reply

> It isn't.

Well, whether it is or isn't depends on how good the docs are and how well you write them. Yes, in many cases it can be easy to forget if the documentation is not woven in well enough.

> All public APIs should have documentation, even if you believe it's obvious what they do.

As a note part of the function of such documentation is to establish standards for what is acceptable in terms of expected input and output handling. What this means is that if documentation defines the code contract, then the first thing you look at when debugging is the API's documentation. Then, if it matches what you are doing, you might dig deeper.

What this gives you is not debugging by comments (something K&R rightly hated) but asking which side the violation of code contract is on. If the documentation doesn't match what you are doing with it, then the violation is on your side. If it does, then the violation may be on the API's side. The goal here is to define where changes can most productively be made.

> Because there's other code in the wild that relies on the behavior that you promised.

That's exactly right. More specifically the API documentation is the promise.

> If you're changing it so often that updating the damn comment is an issue you either document implementation details that don't belong in API documentation or your API is unstable crap and nobody should be using it.

The thing is it took us a long time to get our documentation approach right in LedgerSMB. It was a struggle that really only I think reached something I am happy with 5 years into the project. A lot of our public SQL API's are not documented actually, because they are dynamically discovered at run-time and are minimalistic (and consequently the developer contracts far more vague than the API conventions, so it isn't always clear what belongs in the documentation since it is all dynamically looked up anyway), but our Perl code is very well documented and I am very happy with that.

For the SQL though, it's written with documentation generation scripts in mind and therefore the question is what you can document on top of what is already there in the system catalogs.

[+] j_baker|12 years ago|reply

You can change an API without removing functionality. Sometimes you want to add functionality. Or sometimes the context around the API changes. Like for example, a python API might not function the same in Python 2.5, Python 2.7, and Python 3.

All I know is, I help maintain a very large set of public APIs that my team is very resistant to changing, and yet somehow the docs are still out of date.

[+] baddox|12 years ago|reply

> Complaining that API documentation gets out of sync with the code is like complaining that unit tests break when you change the code.

But that can be a valid complaint as well. There is an undeniable cost of maintaining tests, and it shouldn't just be taken for granted that the cost is worth it.

[+] jdmt|12 years ago|reply

I don't disagree that public APIs should be well documented. However, I believe it's very easy to forget to update the comments. Especially in scripting languages without type safety. There is little to remind the programmer that arguments were added, types were changed, argument order was modified, behavior was changed, return value was changed, locking behavior was changed, memory allocations were changed, etc. Obviously unit tests should help catch many of these changes. However, it's still up to the coder to remember to change the comments. When you're behind on a deadline or you're juggling 50 different API changes because you're still alpha I'd say it's pretty easy to forget to update a comment.

See http://api.jquery.com/jQuery.ajax/ if you want an example of on API that could easily have a few typos in it. Who is checking to make sure it's 100% in sync with the code 100% of the time? Not trying to say this example has bugs in the docs but there's a LOT of behavior described that could be out of date.

[+] kyberias|12 years ago|reply

Don't you see the difference between changing a comment that doesn't break a build and breaking code that breaks the build because a test fails?

[+] _-_-_-|12 years ago|reply

>> When some coder changes the function, it is very easy to forget to update the comments

> It isn't.

Yes, it is. Have done it myself loads of times.

[+] einhverfr|12 years ago|reply

First, I think more documentation always beats less documentation, assuming reasonable quality. What the article is getting at is the idea that documentation for the sake of documentation never results in any quality.

When I write a new module for LedgerSMB (I won't vouch for older code either by myself or others) I actually start writing the documentation. The reason here is that the documentation is written primarily to establish the contracts under which the code operates. This includes concept of operation documentation as well. It isn't just aimed at other programmers. It is aimed at documenting the code contracts so that it is clear what are acceptable operations at first.

So if there is a fallacy it is not that more code documentation is better (since that is often true, IMO), but rather that telling people to document for the sake of building documentation works.

[+] jheriko|12 years ago|reply

"What the article is getting at is the idea that documentation for the sake of documentation never results in any quality."

X for its own sake is rarely good.

[+] oaktowner|12 years ago|reply

One thing to bear in mind is that documentation serves two different audiences.

One audience is the people who will use your code. For them, every external method should be properly documented (sure, use a tool for this). And make sure it's good enough that (barring debugging situations, because you wrote perfect code) your "users" never have to look inside your code. (If I think about this in C++, I think you should be able to look at a properly commented header and never read the code)

And then there's the poor slob who's going to come in and debug/fix your code some day. He may not need to have every method doc'd but he damn sure needs to know what's tricky, what's interesting, where the gotchas are, etc.

[+] acjohnson55|12 years ago|reply

I've done plenty of dynamic language work where some documentation would go a long way. In Javascript, every library function should explain what all optional (i.e. undeclared or reinterpreted) parameters do, what gets returned, and likewise for all callbacks, plus `this` points to. It's very stylish for some insane reason not to do this, and it drives me insane.

[+] mercurial|12 years ago|reply

Don't get me started on Javascript. It's a problem with dynamic languages in general - the signature does not document what interface the parameters should implement (eg, a parameter 'file' may be a file path or a file handle, but you'll only know that by looking at the implementation), but it's particularly egregious when nothing tells you at first glance which parameters are optional.

I've now started to document public functions. As my classes usually have few public functions, but many private ones, the code to comment ratio remains acceptable, though maintaining the documentation remains a challenge.

[+] scott_s|12 years ago|reply

I agree. Some Python APIs aren't clear on what the parameters to functions are. It's frustrating to not know what type is even expected - yes, Python is dynamic, and one parameter can take on multiple types, but the documentation should still say which types are allowed, and what the behavior will be.

[+] thomasmeeks|12 years ago|reply

Documentation does not exist for your own benefit. It exists to help other people on the team, or the programmer that inherits your code, quickly understand your code. The key word here is quickly. Yes, a programmer can trace code & figure it out. On a large chunk of code, however, that is exceedingly inefficient. Whether or not documentation is obvious to the person that wrote it is a very poor test for the documentation's utility.

I used to hold a similar opinion -- "Document the non-obvious". The problem is that in a project of sufficient size, almost everything can slide towards non-obvious. Is price the base price, or unit price * quantity? Is the method name cancel_subscription_and_notify_customer really effective? Of course, I could use cancel_subscription, but then I'm not telling programmers about the email that goes out, or cancel_subscription_and_notify, but who am I notifying? The marketing department? Generally I find really descriptive method names to get unwieldy very fast. Further, if you say document the non-obvious, the tendency is towards zero documentation.

The value statement depends on how fast your team grows or changes, and the expected lifetime of the project.

If you are working on a project alone, and that will never change (e.g. it isn't something a business relies on), then you probably do not need documentation. This is also true if you are bringing on a dev a year, and the team size will always remain relatively small. Similarly, if the project is relatively short-lived (like a game), then dropping documentation could be a good idea. Maybe, I'd at least concede there are merits to doing so. Documentation isn't free, of course.

On the other hand, if you are working at a company that's trying to rapidly grow, needs to bring on devs quickly, or has developers moving from one project to another frequently, then I'd say documentation is very important. You are going to save your team a huge amount of time by taking a little time upfront to explain what you are doing, why, and the consequences of each method. Even simple methods deserve documentation for consistency sake.

If your documentation rots, then you handle it the same way as test rot. Make sure the team knows that docs are necessary, they need to spend the time on it, and if that means more time for features, so be it. I can say from experience that writing documentation after the fact is pretty gnarly.

[+] mhaymo|12 years ago|reply

The less that is described by the signature, the more necessary it is to document a function. In C++/C#/Java, while the parameters and return values are often fairly obvious, exceptions are completely undocumented by the type system (unless you use Java checked exceptions, which you shouldn't), and so should be documented manually (and asserted in unit tests, to ensure the documentation doesn't become incorrect).

[+] tjeerdnet|12 years ago|reply

This is an ever ongoing discussion between two kinds of programmers (documenting and non-documenting). Just decide for yourself or your team what works best. I myself like to even document other people's code after I see what the method does if the method isn't speaking for itself. If I am developing I want to see in my IDE when adding an existing method, a popup which tells me quickly about what the method does and the arguments it has and what it returns. Instead of (everytime) having to jump to the code to see what it does.

Just one simple (real life) example:

  public String convertText(String text) {
   return text.toUpperCase();
  }

This method is already named wrong in my opinion and should be refactored to something like convertTextToUpperCase to understand what it does without having to document. But if your methods get more complex I think a little comment on top of the method describing what's going on really cannot harm. Especially if the code is difficult to read for new people.

The point is in the end to keep the documentation in sync with the code and that takes indeed some effort. I myself always make documentation for a method in Java-doc style, so only above the method, if it's more complex than a simple getter/setter-method. I always tend to think in terms of the official Sun Java API-documentation, which I use(d) so often to know how all the classes/methods work, that it might also make my own code more readable/understandable when I or someone else has to work on my code if I have documented it. Inside the method code I try to comment little to not.

@snowwolf: I agree, but it's just an example to show that a method name should speak for itself

[+] snowwolf|12 years ago|reply

Just to comment on your example, that method shouldn't exist as it adds no value to the codebase - it is purely redundant code. Especially if you were to rename it to convertTextToUpperCase.

The only situation where the method would make sense is if you wanted to be able to change the implementation in the future (TitleCase, LowerCase etc.), in which case a better renaming would be covertTextForDisplayInTitles (i.e. Use the method name to comment why we need to convert the text). That has the added benefit of also telling you what the method does in your IDE just from its signature.

[+] lmm|12 years ago|reply

"Cannot harm" is a fallacy. Every additional comment adds a maintenance burden.

Sometimes comments are worth the cost, but they should be a fallback to a fallback - ideally, the code should be self-explanatory. If that's not possible, unit tests should explain the usage and functionality - they're better than comments because the build system enforces that they're updated when the code changes. Only if you can't do that either should you resort to a comment.

[+] informatimago|12 years ago|reply

As it is, it looks indeed quite dubious.

But if it had a specification comment added, it may be perfectly justified.

Remember, in programming, there's no problem that can't be solved by one additionnal level of indirection.

Here we have one level of indirection. What's not clear, is what problem it solves. This is what the comment should tell, or better, the name of the method. But perhaps we're in a context where converting things is the natural thing to do, and in this specific case, the convertion of text is a mere upcasing. Probably the conversion of numbers or the conversion of arrays will involve more work. Notice how I imagine (but leave unwritten) some specifications to justify this code. In a program those specifications should not be left unwritten.

[+] einhverfr|12 years ago|reply

Well, I think the code needs to be clear enough that you don't want to document unnecessarily. In general, I think code contracts should be documented, as restrictions on solutions that programmers may need to be aware of. But comments are not a substitute for clear code.

[+] tsiki|12 years ago|reply

Probably all of us agree that being forced to comment everything will lead to some bad/useless comments. But the examples you showed were simply bad comments. Just because there are bad coders out there who don't care about the quality of their comments for whatever reason, isn't a reason to avoid comments. I'd treat anyone who wrote that first sample comment in a similar way I'd treat a programmer who writes unreadable code; that is, probably take the time to teach them some good commenting practices.

Many of the reasons that speak against commenting apply to good variable names, too. Maybe someone will come later and change the way the variable is used but won't change the name. It doesn't mean we should avoid descriptive variable names, though.

Also, with comments, as with any form of communication, the audience is the key. Let's say I'm a senior programmer somewhere and I'm writing comments. Often the train of thought seems to be "well, using this variable name/adding this comment clears it up for me". But that's usually not nearly enough for junior coders who are new to the codebase, and who are often the target audience. They'll probably still go "wtf" after reading a comment aimed at a senior programmer with an understanding of the codebase and a programming experience to match.

In addition, the obvious point to make is also that code is good at answering how, not why.

This is a bit of a pet peeve of mine, I guess since I've met relatively many coders who claim that good code should comment itself and ditched commenting altogether. Their code has usually ranged from above average to downright awful, and has, on average, been rather unreadable.

[+] mathattack|12 years ago|reply

There's a principal/agent problem with documentation too. The writer of it rarely gets the benefit. Many times they will never see or meet anyone who does. But they have a lot of other competing priorities from highly visible requesters.

I am interested in examples of companies that get this right.

[+] redblacktree|12 years ago|reply

I agree with this assessment. Whenever I have the opportunity to get feedback from someone who has used my documentation, I make every effort to get it, and then update the documentation. It's gratifying to know that someone benefited from it.

[+] progx|12 years ago|reply

"As a rough estimate, 95% of functions in any code base should be so simple and specific that their signature is all you need to use them."

Welcome to reality!

What is simple for one guy is really hard to understand for an another one.

What you expect is that every bigger function must be split into dozends of smaller functions, only to have a cleaner parameter part. But this make the code flow unreadable.

[+] kyberias|12 years ago|reply

> What you expect is that every bigger function must be split into dozends of smaller functions, only to have a cleaner parameter part. But this make the code flow unreadable.

That's exactly what every good, experienced developer tries to do: Splitting complex stuff into more simpler functions that are easier to understand. Obviously the end result is not unreadable, on the contrary!

[+] mixologic|12 years ago|reply

I'm surprised nobody has mentioned Master Foo and the Programming Prodigy. http://catb.org/esr/writings/unix-koans/prodigy.html

[+] samatman|12 years ago|reply

In many languages, the 90% comments from this article are "doc strings", while the 10% comments are actually comments. They have different uses, and should be treated accordingly.

I also don't understand the fear that doc strings will go out of date. At least with dynamic languages that's part of the point: if the doc string is wrong, then either the contract has changed and not been updated or the code is fulfilling the wrong contract. Both are useful things to know.

More often, the doc string is correct, and serves both as a guide to the code "here is what you are about to read" and as a quick summary. if you're trying to decide, say, between iterate-dirs and walk-dirs, that summary is perfect, while reading the code would be an annoying digression.

[+] paulgrayson|12 years ago|reply

The "turing test" of comments.... If I can distinguish whether the comment was written by a human or generated by some auto-documenting software then the comment may be useful; if I cannot tell whether it was written by a human or auto-generated then it is useless.

[+] dexen|12 years ago|reply

...and for those who like concise quotes:

`If the code and the comments disagree, then both are probably wrong.' -- Norm Schryer

`Don't get suckered in by the comments -- they can be terribly misleading. Debug only code.' -- Dave Storer

[+] jarrett|12 years ago|reply

My rules for good docs:

1. Begin with a usage synopsis in the form of sample code. It should hit the most important functions and show how they fit together. E.g. for a drawing library, show how to instantiate an image, draw a circle with arbitrary fill and stroke colors, and write the image to disk.

2. For each function or method, open your comment with a straightforward description of what the function does, even if it's absolutely, undeniably obvious. If you're writing an math library, you should even say what the sqrt function does. It doesn't hurt, it costs you very little effort, and you might help someone who's just beginning to learn about the problem domain.

3. For each parameter, document its possible types if your language doesn't encode that information in the function signature. Even if you're using something like Haskell where it does, you might need to comment on the type. E.g. if sin takes a float, say whether it's in degrees or radians.

4. Provide sample code for functions that have to be used in tricky ways. E.g. if the function requires special setup or context.

[+] informatimago|12 years ago|reply

Ok, the problem is not the documentation. It's the specifications. This problem is exacerbed by management methods such as Agile/Scrum, where no specification document is built from the collection of task descriptions stored in Jira (and IF you are luck to have anything significant in the task descriptions, more often, from what I've observed, it's hand waving and shin pointing than anything precise written down).

And even if some specification document is written, it still remains the problem because when going thru all the phases of analysis design coding and debugging (whatever the period of the cycle you use), it is not updated!

Now we should probably distinguish API elements from internal implementation stuff (but the blog article mentions APIs).

When documenting internal stuff, unless you've developped internal APIs (which you should do!), the documentation can indeed be descriptive, to help maintainers orient themselves and avoid pitfals.

When documenting API, what you need mainly, is the specifications of the API. This will be the "contract" with the client code, and if there's a discrepancy between the API specification and the implementation, then it means there's a bug (somewhere, of course one could decide that the specifications where wrong, and need to update the specifications instead of the code). Most often it will be a bug in the code.

But the point is that either you have tools to track the specifications elements down to the line of code, so that when you create or modify a line of code, you have easy access to the specifications, or you put the specification in the docstrings (documentation comments) in the code, to get the same easy access. And note that this is a read/write access: specifications may need to be updated when the code is maintained.

So I would agree, write less documentation, write more specifications. Close to the code.

[+] redblacktree|12 years ago|reply

I tried a google search for "shin pointing" and only got pictures of people doing literally this.

I got it from context, obviously, but can you explain this turn of phrase to satisfy my curiosity?

[+] TheDistantSea|12 years ago|reply

So basically the author is making the argument that comment quality is correlated with quantity.

As far as I can see, this is backed exclusively with "but the ones that do probably...".

Let's just say I 'm not convinced yet.

[+] lyesit|12 years ago|reply

I think the argument being made is that comment quality is more important than quantity. The author gave an example where the code with less documentation had more useful descriptions in order to illustrate this point; I don't think he necessarily meant that one caused the other.

[+] pasbesoin|12 years ago|reply

At its best, documentation (or a subset thereof) can and does serve as a cross-reference -- like a cross-reference in an important calculation. If the two results don't correspond, you know you have a problem. (Even if you have to explore both paths to learn what the problem really is and where it lies.)

Something to consider, the next time you find yourself inclined to complain about documentation. Is it the documentation, or the fact that it's not useful documentation?

129 comments