When TDD Doesn't Work

[+] Nursie|12 years ago|reply

IMHO TDD, like a lot of the agile stuff, is a good idea with solid foundations that people get wrong all the time and end up making things worse with.

Agile was supposed to ease up on process and make teams adapt to changing requirements. It wasn't supposed to use up >30% of your working time just to service the methodology, but that's what it ends up doing when you get in the Agile Evangelists.

TDD was supposed to ensure more correct software at the cost of some overhead (perhaps 30%?) by making sure every unit had its tests written ahead of the code. In practice I've seen it kill productivity entirely as people write test harnesses, dummy systems and frameworks galore, and never produce anything.

A combination of these two approaches recently cost an entire team (30+) people their jobs as they produced almost nothing for almost a year, despite being busy and ostensibly working hard all year. We kept one guy to deal with some of the stuff they left behind and do some new development. When asked for an estimate to do a trivial change he gave a massive timescale and then explained that 'in the gateway team we like to write extensive tests before the code'.

The only response we had for him was 'and do you see the rest of the gateway team here now?'

[+] forgottenpass|12 years ago|reply

IMHO TDD, like a lot of the agile stuff, is a good idea with solid foundations that people get wrong all the time and end up making things worse with.

I think the reason it can go sideways is because the advocates and casual adherents don't accept that for some teams a methodology really may not provide the claimed benefits, or it costs too much elsewhere. It's easier to say "It's not [methodology] that isn't working, you're doing it wrong."

That's a really attractive answer, made all the more tempting because it's sometimes true. But not every software team is the same, operating in the same constraints. People tend to generalize based on their own experiences. Lessons learned about what works for X doesn't necessarily apply to Y. The thing that has been lacking in the TDD/Agile/Insert Fad is a higher level "this is why the ideas worked for us, here are the component pieces and their purpose, this is how to determine the pieces to adopt and how to tailor them to your organization."

You could say that someone out there is making that case, but their voice is drowning under the snakeoil salesmen. I don't hear it up front, the rare times I do hear it is deep into the "you're doing it wrong" conversation when any sense of perspective in the discussion has already been beaten to death.

[+] parasubvert|12 years ago|reply

The emphasis I take away from your story is that something may be a good idea, even foundational, but there's no guarantee your team will interpret or execute it correctly.

If you do the math on how much time should be taken up by Agile meetings, it's about 10% of the team's time. Add another 10% for vacation, illness, town halls, dentist appointments, etc. that leaves 80% to understanding requirements, writing code and testing code. Yet the market for "agile consultants" is one of lemons - you don't know if this guy/gal is a huckster or truly working for your success.

Similarly for TDD, I've never heard of TDD being about a team writing tests em masse before code - that wasn't something Kent or Bob Martin ever recommended anyway.

Ultimately this is why I believe the most important roles on a software team are the "management roles" - Product Owner first and foremost. any solid Product Owner I've worked with would have mandated a demo after every iteration and ejected the software team management very quickly if there were no results. Better to punt the problem child early instead of taking the whole team down later!

[+] TelmoMenezes|12 years ago|reply

> IMHO TDD, like a lot of the agile stuff, is a good idea with solid foundations that people get wrong all the time and end up making things worse with.

So one could reasonably suspect that people who "get agile" are just talented and would be good developers anyway. Occam's razor invites us to assume that agile has no effect. Are there any scientific studies on the effectiveness of agile (or TDD), or is this just a homeopathy situation?

[+] DavidWoof|12 years ago|reply

"'in the gateway team we like to write extensive tests before the code"

Which, ironically, is the opposite of TDD. One advantage of classic TDD unit testing is that your tests grow with the code. One of the dangers of not doing TDD is the situation above, where your integration tests require massive scaffolding and custom frameworks upfront, essentially turning the process into waterfall.

[+] reedlaw|12 years ago|reply

I've never seen a project fail because of too much testing. Maybe because of an over-emphasis on process, but not because of writing too many useful tests. On the contrary, projects I've worked on fail or approach failure because of lack of clear requirements, whether in unit test form, BDD, or well-written user stories. If it's not clear what a product owner wants then it's impossible to test and impossible to implement to match the owner's expectations. TDD is useless if you don't know what you're trying to build.

[+] jdlshore|12 years ago|reply

> TDD was supposed to ensure more correct software at the cost of some overhead (perhaps 30%?) by making sure every unit had its tests written ahead of the code.

Not exactly. Remember that TDD came from Extreme Programming, and the radical idea of Extreme Programming was "embrace change:" the idea that you could accept—no, desire—requirements changes after you started programming.

At the time, all software design was supposed to be done in advance; to do it any other way would lead to madness. The (fictional, it turns out [1]) "cost of change curve" said that a change in requirements would cost 20-150x as much if made after coding began, and thus all requirements had to be nailed down in advance.

XP said, "what if we could flatten the cost of change curve, so that the cost of a change is just the cost of implementation, regardless of when the change is suggested?" That's the whole raison d'être of XP.

The cost of change curve was flattened by using evolutionary design. The way you got evolutionary design was with four practices: pair programming (to improve quality), simple design (to avoid painting yourself into a corner), refactoring (so you could change the design), and... TDD. So you could refactor safely.

TDD is about enabling change. The quality benefits are also valuable, but not the main point. That's why TDD'ists care so much about fast tests—you need quick feedback when you're doing design refactorings.

[1] Laurent Bossavit investigated the literature for the source of the cost of change curve claim and determined that it was based on people graphing their opinions, not empirical data. Over time, those opinion graphs were assumed to be based on real data, but they weren't. https://leanpub.com/leprechauns

[+] AnimalMuppet|12 years ago|reply

> Agile was supposed to ease up on process and make teams adapt to changing requirements. It wasn't supposed to use up >30% of your working time just to service the methodology, but that's what it ends up doing when you get in the Agile Evangelists.

Well, I've worked in a (non-agile) environment where the methodology ate up way more than 30% of our working time. If an Agile Evangelist could have gotten us to 30%, most of us would have been ecstatic. (It wouldn't happen, though - we were FDA regulated as a medical device manufacturer, which imposed huge overhead requirements.)

[+] brudgers|12 years ago|reply

30% overhead is often a function of team size, not methodology. A team of 30 is not likely to be agile in a meaningful sense, there are too many coordination vectors and communication channels.

And if those in a position to ask about the rest of the team aren't sold on agile to begin with, the odds of it working are inversely proportional to the odds of people just going through the motions while fearing for their jobs and polishing their resume for a year.

[+] amorphid|12 years ago|reply

I once asked an experienced developer what hr thought about Agile and TDD. He responded by saying that they are useful tools, when used by people who know what they're doing.

There's no replacement for working with quality people, and no tool prevents you from being a moron.

[+] feketegy|12 years ago|reply

TDD and agile in general requires discipline.

You can't expect somebody to drive a car if they don't even know what a gas pedal is.

[+] GFK_of_xmaspast|12 years ago|reply

That sounds a lot like a failure on the part of management.

[+] yanowitz|12 years ago|reply

Except for this statement, beware of absolutism in statements of How To Do Software Development.

The specifics of this debate are kind of uninteresting because of the (general) lack of nuance from various sides, albeit all informed by their own lived experience.

OTOH, the recurrent reality of <insert topic> debate in our industry is very interesting.

I think it's some combination of:

* a bunch of problems are still unsolved

* software is so powerful that sub-optimal solutions are usuallly Good Enough

* industry amnesia, driven by developer/engineer turnover

* the relative infancy of the industry, especially as a function of the rate of change (I'm not sure how you would normalize for rate-of-change, social structure and communication speed, but it would be interesting to compare these debates to medieval guilds in Europe).

* ???

To take up the first two above:

Things are better than they used to be -- as late as the 90s, code reuse was still an unsolved problem. Of course, code quality is still hard--we are reusing broken code, but at least we "only" have to fix it once.

I think it's hard to overestimate the importance of Good Enough as a factor in these recurring debates. Everyone can be right from the business's point of view--tons of money is still being saved. Once you get past the initial ramp of a company, how to structure for continuing velocity of a team and make headway in your chosen market(s) seems like a different optimization problem than what got you there (again, not a new topic!)

Just some partially formed thoughts...

[+] protonfish|12 years ago|reply

Was that an implication that code reuse is solved? I'd say there are more options but their value is still a matter of debate. I have a hard time imagining that code written today is better than 30 years ago. (We do have much better source control tools so at least that's something.)

[+] praptak|12 years ago|reply

About code reuse - here is some disagreement about whether it is a good thing in the first place: 'I also must confess to a strong bias against the fashion for reusable code. To me, "re-editable code" is much, much better than an untouchable black box or toolkit. I could go on and on about this. If you’re totally convinced that reusable code is wonderful, I probably won’t be able to sway you anyway, but you’ll never convince me that reusable code isn’t mostly a menace.' This is from Donald Knuth: http://www.informit.com/articles/article.aspx?p=1193856

[+] narag|12 years ago|reply

I believe recurrent debates are people problems more than technical problems. Although news of DHH's denouncing TDD have been very popular, the technical debate over the details is the same boring story as ever. There are people that can't keep themselves clean from this kind of causes.

[+] enginerd|12 years ago|reply

I agree with your (likely more than) partially formed thoughts; there is still a lot we don't know. In my case, I often work with CSS and TDD would not just add complexity, it conceptually does not make sense with my workflow. It is human nature to generalize solutions. But in our industry, we sometimes find ourselves generalizing 'development' when that is an enormity in and of itself.

From the article: "Besides, how do you know if the CSS is correct? Remember we are doing TDD. We are writing our tests first. How do you know, in advance, what the CSS should be?"

[+] jayvanguard|12 years ago|reply

> as late as the 90s, code reuse was still an unsolved problem.

But that is about the social and business structures and economics around how computing matured. Zero to do with technology. The technology aspects of code re-use were worked out long ago.

[+] grandalf|12 years ago|reply

While DHH's rant has spawned an interesting discussion, it feels to me like he's arguing in reverse in defense of his framework.

Many Rails apps are tightly coupled, and many unit tests written by developers using rails test 10% program logic and 90% framework features.

Of course this is going to be slow. We can argue about hacks to make it faster but at a certain point it's a problem whose solutions start to distract us from solving the important problems.

If you have a web app and get to write a single test to determine whether it's safe to deploy, that test would be an integration test.

The decision to write tests more granular than integration tests is a decision to be made based on assumptions about the rate of change of components of the system.

TDD is tangential to the above observation.

There are many cases where an implementation is easy to figure out (though possibly time consuming) while the optimal interface design is less obvious. TDD can be really useful to quickly iterate interfaces and verify that all the moving parts work as expected together, before worrying about the implementation details... This makes it possible to work on a larger system with more focus on problem solving, fewer mistakes, and less overall cognitive load.

[+] jshen|12 years ago|reply

It's not slow to test a rails app. See DHH's latest post.

[+] programminggeek|12 years ago|reply

I have a really crazy idea on how you could test the GUI in a way that would both save time and provide something more valuable than just testing true == true.

My idea is to create tests that take screenshots and do a diff of them over time. You then could set a change % threshold that would signal a test "failure" signal. Your QA team could then run through that process and see that something significant changed. Maybe that is fine, but maybe that is hugely unintended.

Having a Time Machine of screenshots of different processes, you could compare changes easily and see if they are worth further investigation. For example, this would be useful if you change some CSS or JS for just one page, and it ends up breaking another page.

The key point of this system is not that it would tell when your system is broken, but rather that there was a significant change that occurred that might have broken something. It's not a substitute for human analysis or thought.

Is anyone doing something like this and would it be useful to anyone else?

[+] DougWebb|12 years ago|reply

My company's QA team set up a system like that around 1999. There were far too many false-positives, because the GUI intentionally changes all of the time during development. So instead of testing functionality, the team spent all of their time updating screenshots. The worst was when we made a very simple style change that affected every page in the application; they'd have to redo every single screenshot instead of just doing a 5-second test that results in "Yeah, the banner is the right shade of blue now, and I know it's used on every page."

[+] forgottenpass|12 years ago|reply

Dunno why your comment is downvoted, this is a valuable testing strategy. Taking screenshots as an automated tool walks through your GUI is a great way to find regressions without writing tests for every last pixel. And you can run something like this on it's own or bolted onto existing test cases.

[+] morganherlocker|12 years ago|reply

Facebook does this with a project called Huxley[1]. It seems cool, but these sorts of tools have always suffered from the problem of brittle tests, so it is not a silver bullet. It does seem that it would work well, however, in systems that require stringent oversight around UI changes (like facebook). Places where "if a piece of UI changes by a pixel, we want to know about it and OK the change" is the standard (most apps do not fall into this category though).

[1] https://github.com/facebook/huxley

[+] lvturner|12 years ago|reply

I actually did pretty much exactly this for a previous job for testing a rendering engine - we had a 'golden master' set of screenshots, a bunch of code to render those original golden masters, then used perceptual diff (http://pdiff.sourceforge.net) to check against the golden masters.

It was a bit of an experiment and didn't get used that much - though it did come in handy when trying to write UI rendering that worked with XAML

[+] projectileboy|12 years ago|reply

There are tools that do something similar. At Siemens we used a tool called T-Plan (t-plan.com) to test behavior of a railroad control system. It wasn't perfect, but it worked surprisingly well. eggplant (http://www.testplant.com/eggplant/testing-tools/) also looked good, but didn't happen to work for our system at the time (not really a comment on eggPlant).

[+] bcgraham|12 years ago|reply

Houdini [1] aimed to do this, but it's been a while since I checked it out -- not sure if it launched or is still under development.

[1] http://www.tryhoudini.com

[+] joshuacc|12 years ago|reply

You mean something like this? https://dpxdt-test.appspot.com/

[+] lnanek2|12 years ago|reply

This isn't really true. At smartphone OEMs we certainly do have boxes to put the devices in that perform physical tests like on the touch screen, microphones, speakers, antennas, etc.. And in mobile development we have UI automation tests that confirm buttons are certain color, have certain text or state, the right screens popup when pressed, etc. - heck we have a program called the monkey that presses everything that can be in addition to the UI automation scripts. I know the web side of things has Selenium and similar robots. I think he just hasn't ever worked somewhere where everything is tested which is reasonable. In many cases you have nothing to do with the OS your software is running on, for example, so there isn't as much point in testing beyond what your app outputs to it.

[+] reedlaw|12 years ago|reply

I think what Uncle Bob is referring to are things like layout, color, and so on. Of course you can automate browser interactions with Selenium, but you can't easily catch layout changes, broken UI elements, or regressions. The only method I know of that can come close is automated screen capture comparison. But that wouldn't work perfectly and still requires human intervention to check out false positives.

[+] jimejim|12 years ago|reply

I'd argue that the test has value if it's central to what you're doing and will save you time in the long run. Not every test falls into that category, so it's a cost-benefit analysis.

He talks about fiddling with UI elements which is sometimes a one-time thing after you get it setup. Writing tests for that is sometimes a waste of time.

Now, if you have code that's going to do some form of complex screen manipulation and it's a big piece of what you're doing, it makes more sense to automate some tests.

[+] fwanicka|12 years ago|reply

And do you write all those touch screen, antenna, microphone and speaker tests before you start writing any of the code?

[+] zwieback|12 years ago|reply

True, we've spent a lot of time building machines with actuators, sensors, etc. to physically test UIs. Also, screen scraping to verify GUIs.

I get Uncle Bob's point, though, and I welcome what looks like a very reasonable peace offering to the zealots on the other side.

[+] raverbashing|12 years ago|reply

These are integration tests really, not unit tests

Unless you do a small change then make the robot do everything.

(Also, this may also be used for device tests in the production line)

[+] jshen|12 years ago|reply

"Over the years many people have complained about the so-called "religiosity" of some of the proponents of Test Driven Development. "

And Bob is guilty of that religiosity himself. See here https://www.youtube.com/watch?v=WpkDN78P884&feature=youtu.be...

Jump to the 58 minute mark if it doesn't automatically.

[+] planetjones|12 years ago|reply

>> So near the physical boundary of the system there is a layer that requires fiddling. It is useless to try to write tests first (or tests at all) for this layer.

Maybe I can see what he's trying to say, but I don't think statement alone is accurate.

For the GUI i.e. at the human boundary of the system, the most value (especially to stop regression and catch side affects) is often added with tests e.g. automated tests which perform some user function in the GUI and assert the results.

Another physical boundary of the system is a database. Writing tests which cross this boundary add a lot of value too.

I'd favour these tests which hit the boundaries and go over them, over a codebase with only unit tests and endless mocking any day of the week.

Also these type of tests can be written first. We do it.

[+] agentultra|12 years ago|reply

Unit tests don't have to test everything... just the units. Integration tests should be testing the interactions between different modules. And system tests should be the whole stack top-to-bottom. It ends up looking like a pyramid.

[+] nsfyn55|12 years ago|reply

This article is a breath of fresh air. I can't how many times I've encountered the dogmatic TDD adherent. I write more tests than anyone I know and what I have learned is TDD is great except when the cost of TDD outweighs its benefits. I've seen a dev spend 8 hours fiddling around with Mocha/Chai trying to test if a button changes color in response to a successful callback. Sometimes its good enough to click the button and see if it changes color.

[+] nimblegorilla|12 years ago|reply

We've seen most of this argument before, but the most interesting (new) part of the article is the implication that CSS is the final layer between software and the physical world and thus hard to test. I'm sure the people on the Mozilla, Chrome, Opera, and IE projects would disagree that CSS is untestable.

It seems Uncle Bob implies that it's ok to skip TDD if you think it is hard to test something. There are much better reasons for most apps to avoid testing their CSS. Likewise there are many reasons for some projects to have extensive automated testing around CSS even if it might be hard.

[+] asgard1024|12 years ago|reply

"I have often compared TDD to double-entry bookkeeping."

It always seemed to me, if you were to make perfect, automated tests that 100% cover your application, you would basically have reimplemented it. (Or in other words - if you want to check if your calculations are correct, you have to do the calculations again.) Ideal, fully automated tests are basically taking two implementations, run them side-by-side, and compare the results.

That's why I am not a big fan of tests, in the sense that there is too much focus on them in the SW industry, and they seem like a hammer (useful but overused).

I think there should be more focus on writing the _one_ implementation correctly. This can be done with better abstractions (e.g. actor model for concurrency, functional programming, ..) and asserts (programming by contract), and maybe even automated SW proving. I don't think these techniques are as popular as testing, but I wish they were more popular, because they let you write programs only once and correctly.

[Update: To specifically expand on point about asserts, if you can trivially convert test to assert, why not do it? Unfortunately tooling doesn't support asserts as much as tests.]

[+] williamcotton|12 years ago|reply

A good tool for testing visual interfaces is an image diff combined with a manual testing process.

Initially the tester will view each screenshot of an application state that is being tested and set that as "passing". Next an automated test runs and the latest screenshots are compared to the passing screenshots.

If they are different then the test fails. A manual tester then needs to take a look at the tests that failed and decide if the test actually failed or if the changes were supposed to be there.

If the changes were supposed to be there the tester can make this image the new passing screenshot. Passing screenshots should probably be reset BEFORE the tests are run. I see no reason why not to just check these images in to the repo along with all of the other test conditions.

I've been scheming on ways to do video diffs for testing transitions and animations although I'm not sure if this provides much value. It would be mostly an academic pursuit.

[+] mbrock|12 years ago|reply

There seems to be two aspects to this discussion.

One involves questions of process, enforcing TDD, and whether or not TDD can save a bad team from producing bad stuff. The other is the question of what TDD can offer for a skilled team with an intelligent approach to development.

Mixing these aspects leads people to dismiss TDD because they've seen teams fail by doing TDD in a bad way.

Another question: is there a way to structure software so that questions of boundaries and collaborators become less troublesome for testing? I think a promising road is in value-oriented programming without side effects.

Another way to see that: if you need a lot of tedious mocking to test your unit, maybe the unit should be redesigned to have fewer collaborators, or maybe you should move the complex logic to a pure function, and so on. Maybe TDD difficulties are showing us that there is something wrong with how we write code. After all that's what it's supposed to do.

[+] lclarkmichalek|12 years ago|reply

So you're telling me that being pragmatic will result in sensible solutions? I could do with more blog posts like this!

[+] dllthomas|12 years ago|reply

"So near the physical boundary of the system there is a layer that requires fiddling. It is useless to try to write tests first (or tests at all) for this layer. The only way to get it right is to use human interaction; and once it's right there's no point in writing a test."

This seems dead wrong. There is probably no way to write tests first in this environment, but with so many different browsers interpreting your CSS (to run with preceding example) you need to be aware of when changes in your code cause changes in rendering that might need to be revalidated and further fiddled! I do agree that it doesn't fit well with TDD, but it absolutely can work with automated testing.

[+] DanielBMarkham|12 years ago|reply

"...software controls machines that physically interact with the world..."

See, that's not always true. I would love it if all software interacted with the outside world. But a lot of software doesn't interact -- just take a look at some of that code sitting in your repository sometime. Some of that isn't deployed, isn't being used. You could test that until the cows come home and have a whole bucket full of nothing.

Because the Bobster and the other TDD guys are correct: you gotta test to know that the code is doing what it's supposed to. Testing has to come first. In a way, the test is actually more important than the code. If you get the tests right, and the code passes them, the code itself really doesn't matter.

Where we fall down is when we confuse the situation of a commercial software development team working on WhipSnapper 2.0 with a startup team working on SnapWhipper 0.1. The commercial guys? They are working on a piece of code with established value, with a funding agent in place, with a future of many years (hopefully) in production. Everything they create will be touched and used over a long period of time. The startup guys? They've got a 1-in-10 shot that they're alive next year. Any energy they put into solving a problem that hasn't been economically validated is 90% likely to be wasted.

Tests are important, but only when you're testing the right thing. The test for the startup guys is a business test, not a code test. Is this business doing something useful? If so, then just about any kind of way of accomplishing that -- perhaps without any programming at all -- provides business value.

That's a powerful lesson for startup junkies to assimilate. In the startup world, you don't get rewarded based on the correctness or craftsmanship of your code. You're looking at one or two weeds instead of realizing the entire yard needs work.

Put a different way, we have Markham's Law: The cost of Technical Debt can never exceed the economic value of the software to begin with.

</standard TDD comment>

[+] robmcm|12 years ago|reply

I have never found it to work in visual/interactive development. A lot of the time you are working on something you evolve as you develop, try, iterate again.

I can see it's benefits if you have a simpler I/O for your code.

[+] pornel|12 years ago|reply

TDD for CSS is indeed an odd concept, but it's possible to do automated CSS regression testing:

http://tldr.huddle.com/blog/css-testing/

[+] harel|12 years ago|reply

As a non religious person, there's one thing I don't get in the whole to TDD or not to TDD debate that's ongoing now.

Does it matter if "TDD says this or says that"? Aren't these methodologies more of 'suggestions' for us to adopt as it fits our needs, while trimming the stuff that doesn't? Once you adhere to a methodology religiously you lose the flexibility and pragmatism that methodology intended to give you. It just becomes systematic Dogma following of a rule book, like any religion.

[+] dllthomas|12 years ago|reply

"How can I test that the right stuff is drawn on the screen? Either I set up a camera and write code that can interpret what the camera sees, or I look at the screen while running manual tests."

Or screenshots, of course, which is still relying on code obviously but probably not your code. Of course, defining what you're looking for in a screen shot is going to be nontrivial unless you're doing simple check that it hasn't changed from the last manually-approved version or something.

[+] platz|12 years ago|reply

I am reminded of this somewhat recent discussion when TDD doesn't work (which I believe uncle Bob responded to as well)

https://news.ycombinator.com/item?id=7130765

107 comments