top | item 10891372

Building Maintainable Software – Free O’Reilly Ebook

92 points| ingve | 10 years ago |sig.eu | reply

17 comments

[+] antouank|10 years ago|reply

This is also a great book, from the creator of ESLint, with similar topic http://shop.oreilly.com/product/0636920025245.do .

[+] Chris_Newton|10 years ago|reply

I’m all for promoting maintainable code, but unfortunately I’d hesitate to recommend this book after looking through some of the early material.

For example, the first main chapter is “Write Short Units of Code”, in which the authors advocate a strict 15 lines per method limit. This sort of argument is common, but not one that seems to be supported by evidence.

For one thing, arbitrary limits are rarely a good idea in programming. If I take an 18 line Java function and translate it almost directly into Python, where it’s only 14 lines because I don’t need a few closing braces, is it suddenly now more maintainable? That seems unlikely.

More significantly in this particular case, various studies over the years have not supported the claim that short functions have better error rates, nor that longer but otherwise reasonable functions have higher error rates; if anything, the overall body of evidence seems to suggest the opposite conclusion.[1]

As the book does note itself, the problem with longer functions often isn’t their length, it’s that they are mixing up multiple responsibilities, which can’t then be read, tested, or reused separately. A better guideline here might have been to separate different responsibilities into different functions, rather than focus on the amount of code required to implement a specific responsibility cleanly. Of course, this will naturally lead to shorter functions in a lot of cases, but without the correlation/causation fallacy.

For similar reasons, I wish they had treated their second substantial example (the one about the Pacman-style game board) differently. There are a few maintenance hazards with the original code that perhaps could be improved, and depending on the rest of the code there might be some useful ways to refactor that function for ease of reuse. However, the original function wasn’t awful, and it was reasonably clear what it did and how it worked. I don’t think it is an improvement to replace that with three functions and a substantial amount of shared state wrapped in a class. The code is still tightly coupled, so this offers limited benefits in terms of testing or reuse, and now the reader has to jump around different parts of almost twice as much code to figure out what is going on.

To add insult to injury, there is then a horrible section on common objections that tries to address the criticism that more spread out code may be harder to read. I imagine my psychologist friends would cringe at the way it appeals without evidence to probably one of the most misunderstood results in all of psychology.

I haven’t read the whole book, but the subsequent chapters that I have read do follow a similar pattern, in particular dismissing potential objections to the authors’ preferred style with vague arguments that lack either logical reasoning or citations of hard data. From authors who apparently have CS PhDs and talk a lot about science and software quality in their biographies, this lack of rigour is disappointing.

I applaud the authors for trying to raise awareness of an important and often neglected aspect of programming, but unfortunately this book looks like a missed opportunity: it’s more Clean Code than Code Complete, strong on advocacy but light on evidence and with some questionable advice.

[1] For anyone who wants to explore real data in this area, I suggest starting with the discussion in Code Complete, which helpfully cites several relevant papers from the relatively early research, and then using Google Scholar to find more recent material based on what else cites those papers.

[+] nimnio|10 years ago|reply

"More significantly in this particular case, various studies over the years have not supported the claim that short functions have better error rates, nor that longer but otherwise reasonable functions have higher error rates; if anything, the overall body of evidence seems to suggest the opposite conclusion."

That's an oversimplification of the research, and misleading. After citing five studies in Code Complete (including the one that shows an inverse correlation between errors and function size), McConnell summarizes as follows:

"That said, if you want to write routines longer than about 200 lines, be careful. None of the studies that reported decreased cost, decreased error rates, or both with larger routines distinguished among sizes larger than 200 lines, and you’re bound to run into an upper limit of understandability as you pass 200 lines of code."

I wouldn't advocate for a strictly short functions either, but the overall body of evidence definitely does _not_ suggest the opposite conclusion: the opposite conclusion would be that we should endeavour to write long functions!

Anyhow, nitpicking aside, thanks for providing a quick review of this book. I'm going to skip it based on your comments.

[+] Mindless2112|10 years ago|reply

After a quick browse, I wouldn't recommend this book either. In Chapter 11, it makes this recommendation:

  Comments are valuable in only a small number of cases. Helpful API documentation
  can be such a case, but always be cautious to avoid dogmatic boilerplate commentary.
  In general, the best advice we can give is to keep your code free of comments.

I'm all for eliminating worthless comments, but self-documenting code is still a myth.

[+] DonaldFisk|10 years ago|reply

A hard limit of 15 lines might be acceptable for a very high level language such as Prolog, but it seems too short for most other languages. That said, there's a limit to how much anyone can hold in your head (short term memory) at once, though. I've had to work on code where several C functions were around 500-1000 lines each. Is that acceptable?

They seemed to work (i.e. had no obvious deficiencies), but there's more to code quality than error rates. Functions should be readable, easily modifiable, testable, and in many cases reusable.

Apart from possibly initialization code and functions containing large switch statements, I limit function length in my own code to a screenful: more precisely, one sheet of A4 paper if printed. (60 lines, 80 columns.)

See also 10 Rules for Writing Safety Critical Code (http://spinroot.com/p10/) Rule 4.

[+] makecheck|10 years ago|reply

Actual size is of some importance (lets you fit more on a screen, etc.) but generally I agree that one should emphasize actual improvements that happen to be shorter, over the explicit goal of being short.

I think of Twitter as an example of the brevity problem. On the one hand, the size limit has resulted in some very neat tweets that manage to say a lot in a little space. On the other hand, we ended up with things like super-short URLs that completely remove any hint about their meaning in the name of brevity. One limitation managed to both improve and damage at the same time.

[+] Spooky23|10 years ago|reply

Sometimes I think it's best to look at arbritraty guidelines as principles vs. rules.

In principle, shorter is better. But if something needs to be 20 vs 15, nobody is going to panic.

[+] derrickdirge|10 years ago|reply

For me, aiming for the shortest possible methods and classes forces me to more carefully consider good design principles like Single Responsibility. I don't automatically assume that long methods are wrong, but they certainly smell.

[+] kluck|10 years ago|reply

Great topic and from scanning the table of contents the guidelines are well chosen. The matter maintainability is discussed far too seldom!

I would like to add that "maintainability" more often than not refers to maintenance "by someone else other than who wrote the first revision of some code".

[+] crististm|10 years ago|reply

Maintainability is not the first topic of discussion when mantra is "build software to throw away".

We're in an age of consumerism in software. We reinvent large pieces of software because we don't have a grip on existing ones to be able to repair or extend them. All the known acronyms including NIH are at work.