top | item 4474166

An apology to readers of Test-Driven iOS Development

231 points| Morendil | 13 years ago |blog.securemacprogramming.com | reply

67 comments

order
[+] jamieb|13 years ago|reply
"The second problem, and the one that in my opinion I’ve let down my readers the most by not catching, is that the table is completely false."

There is a big difference between "unsupported by hard data" and "completely false", especially when the author later admits to not being willing to pay to see the first paper cited in support.

It also seems, (from this post: http://lesswrong.com/lw/9sv/diseased_disciplines_the_strange...) that in fact the first line is supported by data, if only in aggregate. The author of that blog posts makes much of the phrase "this study didn't accurately record" while glossing over the next part of the sentence that states "so we will use average rates summarized from several other studies." It is common practice to aggregate data, and provided it is done properly, the aggregation can reduce error, not increase it.

The claim that "a bug introduced in Requirements, costs more to fix the later it is discovered" is claimed to be supported by data. Well done for demanding to see that data. But epic fail for creating controversy by claiming that the opposite is true.

[+] Morendil|13 years ago|reply
Amusing exchange over at Reddit:

Redditor 1: "I bet not a single f$%k was given about that table among the readers of the book :)"

Redditor 2: "That table, or something like it, features strongly in every software engineering textbook that I've ever seen. The numbers differ, but the increasing order of magnitude differences are roughly the same. It's the essential justification for nearly everything in software engineering. "A single f$%k" doesn't begin to describe the importance of that table."

And yes, "the essential justification for nearly everything in software engineering" is broadly correct. Anytime you see an argument that "it's important to get the requirements right" for instance, it's based on studies that purport to show these numbers.

[+] ajross|13 years ago|reply
Indeed, with the semantic cavet that "software engineering" is used in the sense of "software development management process nattering" and not (always) the actual engineering of working software.

The basic falseness of that chart is pretty well known among the hacker set. Or maybe it's a failure of interpretation among the natterers: the "high cost" of mistakes in the past is made mostly of work already spent. The immediate cost of correcting a past mistake is often (of course not always!) vastly lower than people think. And the farther you are from the actual code involved (i.e. if your role is "architect"), the higher your overestimate of that value is going to be.

So this is a feedback loop: the "high level" people at the start of the process have a built-in bias against iterative solutions (which they perceive as "expensive") and thus a built-in predilection to overdesign as they try to get everything right the first time.

[+] ahoge|13 years ago|reply
>"it's important to get the requirements right" for instance

Well, we've all been there. If you don't get them right, it will really hurt. A lot.

Also, moving things in a wireframe or diagram around is basically free whereas moving things around in a "finished" product can be fairly time consuming.

So, there is at least some truth to it.

[+] grayprog|13 years ago|reply
I know Graham. And he's both a great guy and a serious developer. I've also read his book and though I've not yet finished it, I've seen this table, as it's in the beginning.

Having read this blog post, I now respect him even more. Both for the intellectual honesty and for his efforts to try and reconcile the data in the table, with all its consequences.

I wish more people (including me) were this dedicated and willing to admit mistakes.

Maybe that's what makes him a specialist in software security, of all things.

[+] Shish2k|13 years ago|reply
> I therefore no longer consider Table 1-1 in Test-Driven iOS Development to be representative of defect fixing in software engineering

Surely this makes it more representative, in a literal sense -- if he'd checked his data at the start, he wouldn't have to go back to check it + issue an apology + reprint the books now :P

[+] marshray|13 years ago|reply
The general idea ("bugs found later tend to cost more to correct") is uncontroversial.

It's also accepted that this is a very difficult thing to study. There are so many subjective factors involved that's it's really hard to quantify the data or the results. But it seems to me that if Software Engineering as a wants to make progress methodically, it needs to throw out the old unsubstantiated assumptions in order to make room for new conclusions with a solid basis.

[+] robomartin|13 years ago|reply
The time and cost to fix bugs can have a huge variance which is sometimes dependent on the nature and field of the software being written. There are bugs that take seconds to fix and others that take months.

I have personally experienced hunting down a bug for six months nearly full time (10 to 12 hours per day) until it was finally found. This was a real-time hardware system and the code was that of an FPGA. The culprit was a coefficient in one of many polyphase finite impulse response filters. The calculations used to generate this coefficient were done in a huge Excel spreadsheet. At one point, in the hundreds of equations in the spreadsheet, the author had used the ROUND() function when the ROUNDUP() function should have been used. This was enough cause a buffer issue in the FIFO feeding the polyphase filter. These are tricky problems to track down when you are literally looking for events that take somewhere in the order of single-digit nanoseconds to occur.

On the other hand, there are those bugs where you know exactly what is going on and where in the code it is happening the instant you see the behavior. We've all experienced that during the course of software development.

Fixing bugs for a dating website has vastly different requirements than, say, fixing bugs in a flight control system.

One argument is for more up-front planning in order to avoid some bugs. At one point this can quickly become counterproductive. Sometimes it's better to just start writing code and fix issues as they come up.

Now, if we are talking about fixing bugs after the fact, that's a different matter. One example here might be if you inherit the code to a complex website or an engine control system and, without any familiarity with the code, are required to fix bugs. This can easily take cubic time as it requires to learn the code-base (and sometimes the subject matter) while also trying to hunt down bugs.

This is why I tend to take such tables or studies with great skepticism. I haven't really paid much attention to these studies, but I remember looking at one or two of them and thinking that they tended to focus on narrow cases such as fixing bugs in-house, with a stable programmer team and great familiarity with the code base.

[+] lttlrck|13 years ago|reply
Skepticism is deserved. It's a very fine balance, and bearing in mind not much software exist that is entirely bug-free.

What is the cost of a bug that is never discovered? Is it negative? What is the cost of fixing a bug that would never have caused a problem (at each stage of development)?

It's way more complex than bugs == bad

[+] diego_moita|13 years ago|reply
Call me cynical but I am getting very skeptical against a lot of well established "truths" in Software Engineering: the cone of uncertainty, the orders of magnitude difference in programmers productivity, the efficiency of TDD, ...

Most of these well established claims simply don't have enough empirical data to sustain them. They're all bloggers' hand waving and empty claims.

[+] Morendil|13 years ago|reply
"One of these isn't like the others..."

Things like the Cone, the rising-cost-of-defects or the 10x claim have been kicking around for decades.

The evidence for or against TDD is, admittedly, inconclusive, but it's more recent and of a better academic caliber. There have been a lot of studies. Most of these studies aren't any good - but at least someone is trying.

There's a deeper question, which is "granted that all the empirical evidence we have so far for claims in software engineering isn't all that good, how can we get good empirical evidence?"

I suspect that the answer is going to involve changing the very questions we ask. "Does TDD work" is too fuzzy and ill-defined, and there's no way you can test that in a blinded random experiment. People's biases about TDD (subjects' or experimenters') are going to contaminate the evidence.

Instead, we need to ask questions that aren't susceptible to this kind of bias and contamination. For instance, we might want to unobtrusively study actual programmers working on actual projects, and record what causes them to write defects.

[+] itmag|13 years ago|reply
the orders of magnitude difference in programmers productivity

I am skeptical of this too. It makes more sense to have huge swings in ability, not productivity. Ie a poor programmer won't take 10x as long to code a given feature, he will just hit a ceiling of ability and not be able to do it at all.

[+] consultutah|13 years ago|reply
Hopefully this will end in a discussion in which the actual data underlying the table can be had and matched up.

Anecdotally, it seems to make sense that it is more expensive to find and fix a defect later in the process. If a dev finds a defect while implementing a feature, only his costs are involved. However, if a defect is found later in the our process, at least 2 qa analysts are involved: one to find the defect and another to confirm it. After that a project manager schedules out time and assigns the defect out to a developer, possibly not the one that introduced the defect gets the assignment to fix it. The developer fixes the defect, a build is made by the build person. The original tester retests the defect and marks the fix as being verified.

That seems complex, but there are possibly even more steps than that. Unit tests may need to be written, customer test cases may need to be updated.

I don't know if the costs end up being exponentially greater, but they would seem to be greater.

At any rate, it would be good to have someone independently validate that data.

[+] philwelch|13 years ago|reply
It's not written in stone that you need 2 QA analysts, a project manager, a developer, and a build master to fix a bug. That only proves that bureaucracy is expensive, not that fixing bugs "late" in the process is expensive!

I'm not sure what exactly the second QA person adds to the process. As a developer, I need to be able to reproduce a bug myself in order to have any chance of fixing it, so there's your confirmation step right there.

Project manager? No, just have your QA person file the ticket as a bug (with the correct priority) and have your developers pull tickets off your bug tracker themselves. At worst, a slight email/verbal nudge from the usual boss should do.

Build person? No, have a continuous, automated build system. The job of the build person should be to maintain that system, not to run individual builds.

Now you're down to one QA person to file the ticket, one developer to fix the bug, and the same QA person again to close the ticket. Add in a few minutes of another developer's time to code review the first developer (you don't even do that despite having all that process?) and it's still cheaper.

Unit tests? Developers write their own unit tests. Have a failing unit test that reproduces the root cause of the bug before you fix the bug--that's a best practice anyway. For testing above and beyond that, it's not unreasonable to have SDETs maintain that stuff, though you already have the root cause of the bug captured in a unit test so your main concern should be whether any existing tests actually rely upon the broken behavior.

[+] neves|13 years ago|reply
And don't forget that you'd have other costs involved:

  o The cost to fix the corrupted data

  o Opportunity costs of a broken software (it is Black Friday and your site is down)

  o Image costs of the ability of your company/product

  o The costumer cost (specially if she is also from your company)

These costs are very difficult to measure. The chart is popular because it matches our expectation as developers.
[+] sunraa|13 years ago|reply
Lots of props to the blog author for going through the hoops that I suppose we should all be going through. I've always placed the CC books up there in the pantheon of great software engineering books. This is a chink in the armor and I hope Mr McConnell takes the time to provide a response although I'm not holding my breath. Another object lesson in not accepting things at face value.
[+] jakejake|13 years ago|reply
Based on my own subjective experience I've found the numbers regarding testing to be suspicious simply because "bugs" are so varied that any specific number would be subjective.

I'd always assumed that these numbers referred to an architectural type of bug where, once made, more code is built upon the bug and so a cascading effect occurs. The more code that relies on the bug, the worse it is to fix because you have to fix all of it's dependent code. In some cases you may need to repair data, or entirely refactor sections of an application.

But there's other bugs that are more "typo" level bugs where obviously the time to fix is exactly the same no matter what phase of development.

I had just always assumed those numbers were a worse-case average to encourage developers to better plan their architecture. In part because the context of code complete is more about planning and estimation.

Though there may not be support for the exact figures, I think the point of them is to not let things pile up and to try to put thought into your design especially at the architectural levels.

[+] MattRogish|13 years ago|reply
This seems to have been my experience and interpretation as well. A bug is not a mistake in coding, but a mistake in selecting core architecture, 3rd party libraries, etc.

Making a mistake picking the wrong storage engine can cause major problems if you find it can't scale to your load after 12 months of development. That's a ton of code to change if you're switching from MySQL to Redis (for a crazy example).

Make a mistake in copy or layout, and that's usually much easier to change.

[+] unreal37|13 years ago|reply
It seems he wasn't able to see some of the sources. One cost money to see, one is out of print and not available...

It may be true and noble that since he isn't able to verify the data himself, he shouldn't have used it. But if you can't see the sources, you also can't state that the data is incorrect. One of those books he couldn't find might contain the data that directly corroborates this.

I suspect the cost of fixing defects has more to do with your internal process, and less to do with how much work it takes to find and fix the actual defect. I have a client I work for now for which I need about 10 days lead time to get code from development to production. So it might take 1 hour to fix a bug that is discovered, and 8 hours to do the paperwork and go through the formal process of moving code through testing, staging and finally to production.

[+] aptwebapps|13 years ago|reply
I wonder if he tried asking McConnell just how he compiled the table. It does seem a bit odd to construct such a simple table from eight different sources.
[+] mcguire|13 years ago|reply
...none of which seem to have the actual data to support the table.
[+] smoyer|13 years ago|reply
I seem to remember first seeing a table like the one the author describes in "The Mythical Man Month" but my copy is currently at home. The data underlying that book was gathered from projects at IBM in the '60s and '70s and I don't really doubt that the underlying data was fairly represented.

The bigger question is whether improvements in processes and tools have obsoleted this data. TDD and automated regression testing would be one place where inter-phase defects could become less costly. On the other hand, projects that have uncorrected architectural errors can be completely functional and yet their maintenance costs never decrease.

If someone has a copy handy, please validate my memory otherwise I'll check when I get home this evening.

[+] alttag|13 years ago|reply
I've just gone through it page-by-page, and didn't see a table like that. Using the 20th Anniversary Edition, I also looked through the summarized list of claims in each chapter, and the 20-year retrospective, and did not see the table or a section that might have been a textual version of the same data.

Perhaps the closest bit I found was in the chapter "Plan One to Throw Away": "The total cost of maintaining a widely used program is typically 40 percent or more of the cost of developing it" (p 121). Similarly, another section quotes Capers Jones, "Focus on quality, and productivity will follow" (p 217, emphasis original).

That, I think, is about as close as MMM gets to these claims.

[+] zwdr|13 years ago|reply
I wish more authors would check their references like that. Not that I am interested in iOS-development, but still.
[+] habitue|13 years ago|reply
It's great to see such intellectual honesty. It looks like the real perpetrator, however, is McConnell.
[+] ludflu|13 years ago|reply
Much respect for the frankness and forthrightness with which he addressed this. I've always taken the exponential increase in the cost of bug fixes with time for granted. I won't do so in the future. We need more empirical studies of software development!
[+] Tashtego|13 years ago|reply
Anecdotally, I have certainly found that although the cost in time may not vary as much as this table would indicate, the cost in stress ramps up even faster. Fixing a bug in production is usually a highly stressful endeavor for all involved. I would love to see a similar table phrased in terms of stress comparing different development methodologies currently in vogue (test and throw it over the wall, CI, automated pushes vs. manual pushes, etc.)
[+] emmapersky|13 years ago|reply
A google scholar search for the title of the paper behind the paywall reveals numerous copies freely available through university websites.
[+] f4stjack|13 years ago|reply
So yeah, long story short from what I gather is: I couldn't find the referenced sources to the table I am referencing so it must be false!

What a leap of logic, seriously...

[+] pilif|13 years ago|reply
That's not at all what he way saying. In the conclusion at the end of the article he said that "I therefore no longer consider Table 1-1 in Test-Driven iOS Development to be representative of defect fixing in software engineering [...]".

So while the data might still be true, the numbers are not necessarily accurate and thus you shouldn't base conclusions on that table. He doesn't say that the values are wrong - just that they might be incorrect.

On a related note: I'd love to get real numbers for this. Fixing bugs happening in production certainly feels much more expensive and cumbersome, but is there real data aside of this table which now apparently is inaccurate.

[+] ludflu|13 years ago|reply
He didn't say it was false. He did say it was unsupported.