My rule of thumb: tests cast away fear. Whenever I get that sinking feeling that I'll break things when I change the code, I write tests until my doubts disappear. It works every time.
In response to the article: it's true that "you should very rarely have to change tests when you refactor code", however, most of the time, most coders are changing the expected behavior of the code due to changed or added requirements, which is not refactoring. Tests should change when the requirements change, of course. I am not contradicting the article, only clarifying that the quoted statement does not apply outside refactoring. (Refactoring is improving the design or performance of code without changing its required behavior.)
In my experience, a lot of unit tests are written with mocks, expectations, and way too much knowledge of the implementation details, which leads to broken tests simply by refactoring, even if the behavior does not change at all.
If you test units in isolation, while adhering to SRP which results in many smaller units depending on eachother to do a task, then simply refactoring without changing behavior screws up a considerable portion of your tests.
As for "tests cast away fear", that is definitely true. Whether or not the lack of fear is warranted is something else, and depends heavily on the quality of the unit tests. I've seen plenty of devs confident of their change because it didn't break any unit tests, only to discover that it broke something they forgot to test.
I find it helps when I consider the tests to be part of the code. If I need to change existing functionality, of course I'll need to change the tests so they test the new requirement(s). If I'm adding new functionality I add new tests to test the new requirement, but I shouldn't break any of the old tests when I do that. If I'm changing code with no requirements changes (as in, pure refactoring, not "tidying up as part of new feature development"), all the existing tests need to pass unchanged...
I always see unit tests as a snapshot of dynamic behavior by 'recording' the tested logic. I make sure I'm aware of any logic changes since that change could eventually break the desired result.
Whenever I'm vetting in-/output values to a given set of parameters, I do move from white box (unit tests) to grey box testing.
When I'm done with grey/white box tests, I do make sure integration works as expected.
Why all the hustle of moving through 'the onion'? I wanna make sure to detect misbehaving/unexpected logic as quickly as possible. Searching for malfunction detected while running integration tests takes way more time than already catching them at the onion's most inner layer (unit tests).
I prefer to speak about "confidence" but I agree with this point quite a bit, if I am making changes and am not certain if my changes will cause breakages I'll manually test things and codify those manual tests are integration/unit tests so that I never need to write them again. Then in the future I can modify code in the same neighborhood with confidence that any breakages I'd cause would be caught by my tests - add in a willingness to liberally add regression tests for any errors that do make it through and I think this approach can really decrease the labour required to make changes, but it does front-load more cost.
It seems like a lot of the logic behind "tests shouldn't break due to a refactor" presumes that tests only test the publicly facing endpoints into the code. The tests that do the best job of reassuring me are tests against code in utility functions and the like. It's hardly a refactor if none of that changes.
The level of experience of the people writing (and maintaining) the code is also a factor I think. As other commenters have said it's all about risk reduction. I definitely agree that you can get a lot of value writing integration tests. At the same time I'm a slightly concerned that if I'd read this article when I was first starting out as a developer I'd have thought unit tests were a side-note or a chore, rather than the building blocks for an application.
Doesn't a lot of refactoring involve change in responsibility between classes? At least I find that a lot in my code. In those cases, the interface of those classes might change (as the responsibility shifts elsewhere), which will of course cause test changes.
> I’ve heard managers and teams mandating 100% code coverage for applications. That’s a really bad idea
I hope many managers and programmers out there don't take this the wrong way. I've been an engineer on a project that was attempting to get 100% code coverage on a piece of software I was writing. I heard constant remarks during this period that were similar to "You don't need 100% code coverage, it doesn't do anything!" These engineers who I was working with had only read articles like this and didn't stop to think about what the article was trying to say. From my experience there is no safe rule of thumb for for how many tests should be implemented for a project. There should be just enough to feel safe (as hathawsh has said). If you're recommending to engineers on your team to stop implementing tests when they say "I'm at 60% coverage and it'll take ~2 days to get to 100%" I'd really hope you take the time to understand why they want 100% coverage.
The software I was working on when my coworkers started telling me to stop writing tests was code designed to trigger alarms when patient's vital signs met certain criteria set by doctors. I am very thankful that I did hit 100% coverage because between 60% and 100% there were many small edge cases that, had they caused a death, I wouldn't have been able to sleep well. Had they said I was explicitly disallowed from working on the tests I would have come in on a weekend (or taking a PTO) and implemented them then. It's our ethical responsibility to know when and where paranoia is worth the marginal time penalty.
> ...trigger alarms when patient's vital signs met certain criteria set by doctor
Most of us aren't working on life or death code like this. My React app doesn't need 100% code coverage but you put it well when you said "There should be just enough to feel safe"
> between 60% and 100% there were many small edge cases that, had they caused a death, I wouldn't have been able to sleep well.
But you're not saying the right words here. 100% doesn't mean you've covered every edge case (although I suppose 60% necessarily means you _haven't_). I can hit 100% without actually asserting anything.
I think it's harmful to talk about 100% without also considering mutation testing, boundary analysis, manual mutation testing...
It's our ethical responsibility to know when and where paranoia is worth the marginal time penalty.
Exactly. You are in the tiny minority writing literally life-and-death code, and for which a lot of other common advice given regarding general software development likely does not apply either (like "move fast and break things".) Also, for your type of application I would probably want 100% state space coverage too.
Not to take away from your point, just to remind people:
100% coverage still does not mean 100% of all cases tested - it just means that every line has been run with SOME data and every conditional statement was run once in both directions.
Seems like a very specific concern in a very specific line of business. The vast majority of engineers write CRUD apps for business clients that mostly care about whether it completely explodes or not.
It also depends on the service. Some services are critical infrastructure and should have their edge cases tested. Some services provide non essential functions and can get away with less.
I agree that there is no magic number, but 100% coverage is clearly overkill for the average engineering team. In fact, broadly demanding any code coverage percentage is probably over simplifying the issue. It just shouldn't be 0 :)
Really reliable software does full MC/DC testing, but even that doesn't catch all mistakes. SQLite for example does this but there are still bugs discovered occasionally. It also leads to insane amounts of test code compared to the actual application code. That is much too expensive to maintain for most projects.
I'd never thought about this before, but you can definitely flip this around to a pathological case of doing the opposite of striving for 100% coverage.
The problem with 100% coverage that I think of is that the last 30% is often boilerplate code (generated getters/setters in Java and that kind of thing).
But what if an engineer started with the boilerplate and only then progressed to tests that are actually important? You might get to 30% without testing anything useful at all. Then you might test half of the important stuff and hit an arbitrary metric of 70%.
If someone were ruled by the sonarqube score, they might even do this.
Yeah, thinking about the why matters more than the %.
So many problems with "100% coverage". Even if you measure it as branch coverage, it only indicates how much of the code that is already written has been covered. It doesn't tell you how much code that is missing, eg guards against malicious input, null-checks or other out of range arguments. This reason alone should be obvious enough for anyone to understand that the metric is completely bogus.
As others mentioned it also doesn't say anything about state space coverage or input argument space coverage or all combinatorial paths through your code. Only that some line/branch has been hit at least once.
100% code coverage is a breadth-first metric, and generally sacrifices in depth-first testing (especially of the "core loop" as in 80% of the time is spent in 20% of the code is sacrificed).
The decidability/halting problem hints that perfect testing is an impossibility for systems of any complexity, as in not just too big of a big-O... it's flat out not possible on a turing machine of any power.
That doesn't mean give up all testing, but there is a LOT of dogma in testing philosophy.
Excellent comment, thanks for sharing. One ought to really think about why tests are needed and what value they add, and not just simply follow some random (thoughtful) article on the internet.
I'm a huge fan of tests that ensure the app doesn't break in a significant way in production. How many tests that is depends on the situation. For your case I would agree that 100% test coverage is correct.
It might be obvious to some, but for reference the title (and associated tweet) is a reference to Michael Pollan's oft-quoted advice for eating healthy: "Eat food. Not too much. Mostly plants."
I think one mental model for tests is that they're simply another kind of automation.
The fundamental question of automation is: will I repeat this task enough such that the amortized saving outweighs the cost of writing a script to do it?
Whenever you're thinking of adding a test X, quickly consider how often you and your team are likely to need to manually test X if you don't. Also factor in the cost of writing test X (though that's tricky because sometimes you need to build test architecture Y, which lowers the costs of writing many tests...).
If it's a piece of code that's buggy, changing frequently, brittle, or important, then you're likely to need to validate its behavior again and again. It's probably worth writing a unit test for it.
If it's an end user experience that involves a lot of services maintained by different teams and tends to fall apart often, it's probably worth writing an integration test instead of having to keep manually testing every time the app goes down.
If it's an API with lots of important users using it for all sorts of weird edge cases and you don't want to have to manually repro each of the existing weird edge cases any time you add a new one, it's probably worth writing some black box tests for it.
But if it's code that's simple, boring, stable, short-lived, or unimportant, your time may be better spent elsewhere.
Another model is that your tests represent the codified understanding of what the system is. This can be very helpful if you have a lot of team churn. How do you make sure a new person doesn't break something when "something" isn't well-defined somewhere? Tests are a great way to pin that down. If this is a priority, then it becomes more important to write tests for things where the test actually very rarely fails. It's more executable system specification than it is automation.
I so agree with this. I've been fighting this exact battle at work lately. People on my team have decided to take testing seriously, which is fantastic, but many team members' understanding of what that means is still at the "watch unit-test coverage numbers go up" stage. So let me be very clear where I stand.
* 100% unit-test coverage is a garbage goal. *
I don't hate unit tests. They can have enormous value shaking out edge cases in a self contained piece of code - usually a "leaf" in the module dependency graph - and making it future proof. Love it. However, unit tests don't tell you anything about whether the module's behavior in combination with others leads to a correct result. It's possible to chain a bunch of unit tests each with 100% coverage together, and still have the combination fail spectacularly. In operations, a lot of bugs live in the interstices between modules.
Even worse, 100% line coverage means a lot less than people seem to think. Let's say one line to compute an expression will compute the wrong value for some inputs. You might have a zillion unit tests that cause that line to be counted as covered, but still be missing the test that reveals the error case. I see this particularly with ASSERT or CHECK types of macros, which get counted toward 100% coverage even though no test ever exercises the failure case.
Striving too hard for 100% unit test coverage often means a lot of work - many of our unit tests have 3x more tricky mock code than actual test code - to get a thoroughly artificial and thus useless result. The cost:benefit ratio is abominable. By all means write thorough unit tests for those modules that have a good ratio, but in general I agree with the admonition to focus more effort on functional/integration tests. In my 30+ years of professional program, that has always been a way to find more bugs with less effort.
"You might have a zillion unit tests that cause that line to be counted as covered, but still be missing the test that reveals the error case."
Ironically it's actually easiest to have 100% coverage in worse code, because the more entangled and coupled your code is, the more likely you are to hit branches that are not under test.
> Even worse, 100% line coverage means a lot less than people seem to think. Let's say one line to compute an expression will compute the wrong value for some inputs. You might have a zillion unit tests that cause that line to be counted as covered, but still be missing the test that reveals the error case. I see this particularly with ASSERT or CHECK types of macros, which get counted toward 100% coverage even though no test ever exercises the failure case.
I'd rather have 30% test coverage where the lines under test are the complicated ones (not things like factories) with the testing of those lines hitting all the complex edge cases than 100% test coverage that confirms that all your source code is encoded in UTF-8.
Coverage driven testing is (almost) always (mostly) evil. Coverage as a metric is great. Maybe you're not going for 100%, however it does tell you something about your test set(s), providing a decent measure of "doneness". However all of this is depends on good test cases. Test cases that test something. That might sound obvious, but it's fairly easy to achieve 100% coverage while testing nothing. By the way, the reason I said "almost" and "mostly" before, is you can find bugs while attempting to improve coverage; provided you take a step back forget about achieving coverage, and instead write good tests, that happen to get you the coverage. There's a lot of temptation there and it's not an overall good strategy, but you'll find stuff.
If you think 100% code coverage with unit tests is bad you should see what happens when it's done with integration tests. I'm refactoring some tests now that were written with code coverage in mind, apparently with bonuses tied to the coverage stat. I think I've seen every testing anti-pattern possible in just this one group of ~25 tests.
There's the developers not understanding the difference between unit and integration tests. Both are fine, but integration tests aren't a good tool to hit corner cases.
Many of the tests don't actually test what they pretend to. A few weeks ago I broke production code that had a test specifically for the case that I broke, but the test didn't catch it because the input was wrong, but the coverage was there.
Most of the tests give no indication of what they're actually testing, or a misleading indication, you have to divine it yourself based on the input data, but most of that is copy/pasted, so much of it isn't actually relevant to the tests (I suspect it was included in the overall test coverage metric).
The results of the code defined the test. They literally ran the code, copied the file output to the "expected" directory and use that in future comparisons. If the files don't match it will open a diff viewer, but a lot of things like order aren't deterministic so the diff gives you no indication of where things went wrong.
Many tests succeed, but for the wrong reason, they check failure cases but don't actually check that the test failed for the right reason.
Some tests "helpers" are actually replicating production code, and the tests are mostly verifying that the helpers work.
Finally, due to some recent changes the tests don't even test production code paths. We can't delete them because it will reduce code coverage, but porting them to actually test new code will take time they aren't sure the want to invest.
Before you battle too hard, let me introduce you to another way of thinking. It may not be to your liking, but I hope you'll find it interesting nonetheless. I'll try to keep it as short as I can. Usually when I type this same post, it takes me a while, but I'm getting better at it.
Imagine that the word "test" is a misnomer, when talking about unit tests. Often people think about testing as a way of checking whether or not the code works properly. This is great for what is known as "acceptance testing". However, as you no doubt agree, it's not so great with "unit testing".
For some reason, people hang on hard to the words "unit" and "test" and come to the conclusion that you should take a piece of code (usually a class), isolate it and show that the class does was it is supposed to. This is a completely reasonable supposition, however in practice it doesn't work that well (I will skip the discussion, because I think you're already in agreement with me on that front).
Instead, imagine that "unit" refers to any piece of code (at any level) that has an interface. Next imagine that "test" means that we will simply document what it does. We don't necessarily worry ourselves about whether it is correct or not (though we wish it to be correct). We just write code that asserts, "When I do X, the result is Y".
At the macro level, we still need to see if the code works. We do this either with automated acceptance tests, or manual testing. Both are fine. When the code works to our level of satisfaction, you can imagine that the "unit tests" (that are only documenting what the code at various levels is doing) are also correct. It is possible that there is some incorrect code that isn't used (which we should delete), or that there are some software errors that cancel each other out (which will be rare). However, once the code is working on a macro scale, in general, it is also working on a micro scale.
Let's say we change the code now. The acceptance tests may fail, but some of the "unit tests" will almost certainly fail (assuming we have full "unit test" coverage). If they don't there is a problem because "unit tests" are describing what the code is doing (the behaviour) and if we change the behaviour, the tests should fail.
For some types of unit testing styles (heavily mocked), often the unit tests will not fail when we change the behaviour. This means the tests, as a long lasting artefact are not particularly useful. It might have been useful for helping you write the code initially, but if the test doesn't fail when you change the behaviour, it's lost utility. Let's make a rule: if the test doesn't fail when the behaviour fail, it's a "bad" test. We need to remove it or replace it with a test that does fail.
The other problem you often run into is that when you change one line of code, 200 tests fail. This means that you spend more time fixing the tests than you gained from being informed that the test failed. Most of the time you know you are changing the behaviour, and so you want to have very little overhead in updating the tests. Let's make another rule: Unit tests must be specific. When you change specific behaviour only a few (on the order of 1) tests should fail.
This last one is really tricky because it means that you have to think hard about the way you write your code. Let's say you have a large function with many branch points in it. If you give it some input, then there are many possible outputs. You write a lot of unit tests. If you then change how one of the branch points are handled, a whole class of tests will fail. This is bad for our rule.
The result of this is that you need to refactor that code so that your functions have a minimum number of branch points (ideally 0 or 1). Additionally, if you split apart that function so that it is now several function, you have to make each of the functions available to your test suite. This exposes rather than hides these interfaces.
The end result is that you decouple the operation of your code. When you hear about TDD being "Test Driven Design", this is what it means. This is especially true for things like global variables (or near global instance variables in large classes). You can't get away with it because if your functions depend on a lot of global state, you end up having tests that depend on that (near) global state. When you change the operation surrounding that state, a whole whack of tests fail.
Committing to writing high coverage unit tests which also have high specificity forces you to write decoupled code that doesn't rely on hidden state. And because it doesn't depend on hidden state, you have to be able to explicitly set up the state in your tests, which force you to write code where the dependencies on the objects are clear and uncomplicated.
You mentioned code coverage. I'm going to say that I almost check code coverage when I'm doing TDD. That's because if you are writing tests that cover all the behaviour, you will have 100% code coverage and 100% branch coverage. However, as you correctly point out, the opposite is not the case. The test for the coverage of your code is not a code coverage tool, it's changing the behaviour of the code and noting that the tests fail.
Most people are familiar with the idea of "Test First" and often equate that with "Test Driven". "Test First" is a great way to learn "Test Driven", but it is not the only way to go. When you have full test coverage, you can easily modify the code and observe how the tests fail. The tests and the production code are two sides of the same coin. When you change one, you must change the other. It's like double entry accounting. By modifying the production code and seeing how the tests fail, you can information on what this code is related to. You no longer need to keep it in you head!
When I have a well tested piece of code and somebody asks me , "How hard is to to do X", I just sketch up some code that grossly does X and take a look to see where the tests fail. This tells me roughly what I'll need to do to accomplish X.
I see I've failed (once again) to keep this post small. Let me leave you with just one more idea. You will recall that earlier I mentioned that in order to have "full coverage" of unit tests with specificity, you need to factor your code into very small pieces and also expose all of the interfaces. You then have a series of "tests" that show you the input for those functions with the corresponding outputs. The inputs represent the initial state of the program and the outputs represents the resultant state. It's a bit like being in the middle of a debugging session and saving that state. When you run the tests, it's like bringing that debugging session back to life. The expectations are simply watch points in the debugger.
When I'm debugging a program with a good suite of unit tests, I never use a debugger. It is dramatically faster to set up the scenario in the tests and see what happens. Often I don't have to do that. I often already have tests that show me the scenario I'm interested in. For example, "Is it possible for this function to return null -- no. OK, my problem isn't here".
Richard Stallman once said that the secret to fixing bugs quickly is to only debug the code that is broken. "Unit tests" allow you to reason about your code. If you have so called unit tests that are unreadable, then you are giving up at least 50% of the value of the test. When I have problems, I spend more time looking at the tests than the production code -- because it helps me reason about the production code more easily.
I will leave you with one (probably not so small caveat). Good "unit testing" and "good TDD" is not for everyone. I talked about ruthlessly decoupling code, simplifying functions to contain single branch points, exposing state, exposing interfaces (privacy is a code smell). There are people for which this is terrible. They like highly coupled code (because it often comes with high cohesion). They like code that depends on global state (because explicitly handling state means having to think hard about how you pass data around). They like large functions with lots of branch points (because it's easier to understand the code as a whole when you have the context together -- i.e. cohesion). Good unit tests and TDD work against that. If you want to write code like the above, I don't think unit tests will work for you.
I personally like this style of programming and I think it is dramatically more productive that many other styles of programming. Not everybody is going to agree. I hope it gives you some insight as to why some people find unit testing and TDD to be very productive, though.
The current version of Unit Testing came out of the Chrysler C3 project, which was written using VisualWorks Smalltalk. (Eventually sparking Extreme Programming and SUnit, which was the ancestor of JUnit in Java land.) Here's the thing about Unit Testing in an environment like that. The best way to code and refactor code would also automatically refactor all of the Unit Tests. In an environment like that, Unit Tests are pretty nimble. There are no long waits for compile times. The entire test suite can run at the press of a button from a widget integrated in your standard development environment. Though, from what I read, the entire Unit Test suite would take an entire 10 minutes to run. However, you could easily just run the tests for the classes you were working on at the time, and reserve the whole suite for checkin time.
So what happens when you move the practice away from this particular kind of Smalltalk environment? Refactorings in most languages are slower without the Refactoring Browser, and often your Unit Tests effectively double the amount of work involved. The velocity of change slows down. Unit Tests might be less nimble to run. A long compile time might be involved. Given those changes, it makes perfect sense that a larger granularity of tests and fewer tests might be more convenient.
I believe that contrary to the conventional wisdom one should write tests from top down. First integration tests, then move down to testing individual functions if necessary. Not the other way around.
On my current project, for the first time in my long career, I have 100% code coverage. How I achieved it? By ignoring best practices on what constitutes a unit test. My "unit" tests talk to the databases, read and write files etc. I'll take 100% coverage over unit test purity any day of the week.
The number of times I've said "oh I bet this works" just to be wrong when I wrote the tests is countless. For me it's more about having small well-defined interfaces with very strong tests
Integration tests really are the best bang for the buck. So many times people think they tested their code by mocking something like the database layer and are surprised when the application in production breaks. Everything can be tested in isolation and work perfectly, but often it's how the components work together that determines if the system works at all.
Integration test what you can. Unit test core pieces of logic. Avoid mocks.
In general automated tests are used mostly to save time as the earlier you find the bug the more savings you have when fixing it. When your testers find it, they need to create and fill the ticket, assign it to developer, then developer needs to reproduce, fix it and assign it back and tester needs to retest it. And it's much more expensive if it happens in production! Integration/unit tests can catch a lot of those. There are some diminishing returns so 100% coverage is not needed and integration tests are more effective in catching bugs so I agree with the article's idea. Use unit tests for some real units - algorithms, calculations etc, and don't just test mocking framework. With additional layer of end to end tests and manual testing system should be able to achieve pretty good quality without spending unreasonable amount of time for it.
This is pretty similar to my approach. In the context of developing an API I use integration tests more than unit tests. It is straightforward to spin up an application server against a test database with some test data. Run some example requests against it and verify the results.
I frequently have Test classes which contain a full scenario related to a specific resource or action. For example:
* create resource
* lookup resource
* modify resource
* list all resources
* delete resource
The JSON files for the integration tests are then used as examples in the API documentation.
Unit tests are reserved mostly for verifying business logic components. There is no need to setup a complex ControllerTest with mocked out Services or Repositories as you don't care about the internals anyway. Just the input vs output and the integration tests cover those already.
I agree with the part that you should write tests, but I definitely disagree with the part that most of your tests should be integration tests.
As you pointed out the testing pyramid suggests that you should write more unit tests. Why? Because if you have ever tried TDD you know that unit tests make you write good (or at least acceptable) code. The reason for this is that testing bad code is hard. By writing mostly integration tests you lose one of the advantages of unit testing and you sidestep the bad code checking part.
The other reason is that unit tests are easy to write. If you have interfaces for your units of code then mocking is also easy. I recommend stubbing though, I think that if you have to use mocks it is a code smell.
Also the .gif with the man in pieces is a straw man. Just because you have to write at least 1 integration test to check whether the man has not fallen apart is not a valid reason to write mostly integration tests! You can’t test your codebase reliably with them and they are also very costly to write, run and maintain!
The testing pyramid exists for a reason. It is a product of countless hours of research, testing and head scratching. You should introspect your own methods instead and you might arrive at the conclusion that the codebase you are working on is bad and it is hard to unit test, that’s why you have chosen to write mostly integration tests.
Maybe but do you think 100% coverage is such a bad experience to have at all ? This can be a rich exercise, and a rite of passage to know what tests matters, which should be automated (fuzzers, dbunit etc), which can wait, etc... and how to get to 90% with the effort of doing doing only 30.
Integration tests should write themselves. It depends on the projects, but personally i use to get 65% of coverage for the price of 5%. For example you want to test a bunch of commands, then you can make a function like autotest('some command', 'some_fixture.txt'): call "some command", capture the output, and write some_fixture.txt with the captured output, and fail complaining that it had to create some_fixture.txt. The next run it will find some_fixture.txt and compare the output and fail only if it differs.
Unit tests should of course be hand written, but to cover everything that matters like for bugs, or for what you want to refactor in TDD. Of course any line that's not covered can potentially break when upgrading versions, but this kind of breaks are likely to be revealed by the 90% of coverage that I think you can get with minimal effort by applying this recipe. Then, you can afford 0day updates when underlying libraries upstream make a new release candidate.
100% coverage doesn't prove the code is fully tested, just like having unit/integration/acceptance/smoke and other types of tests doesn't prove that application works. But this doesn't mean we stop adding tests to our suite, so why stopping adding more coverage?
I totally agree with "just stop mocking so much stuff". Most of the time we can use real implementation and by doing this, we'll also increase coverage.
100% unit test code coverage is just Uncle Bob BS and it is infeasible for the most part
And while it will give some warranties about correctness, it will not fundamentally guarantee the application does what is supposed to do.
Integration tests are certainly better
I can only see the headline being a "great wisdom" to people who have accepted Uncle Bob and TDD crap for too long and without questioning it. Because it is obvious.
Why it is obvious? Because that's how things were done most of the time
When you had 640k of RAM and a C compiler writing unit tests was not impossible, but pretty hard. Testing that your app reacted to inputs and acted correctly was doable and easily automatable. And what wasn't automatable would be tested manually.
Now here comes the "holy highnesses" of testing gurus, gaslighting developers and saying that code with no tests (which tests? automated? unit? the definition is purposefully vague) doesn't work or that the only blessed code is the one that is produced through TDD onanism? No thank you
I agree with the writer about the statement that you don't need a 100% test coverage but you still need to check your coverage. I mean you need to run coverage and go through your code and see if there is any gap in your testing.
That said I don't agree with the statement "mostly integration" the reason is that he don't take into an account the ROI of a test, he only take into account the outcome. I mean e2e tests are the best to catch bugs but they are harder to achieve harder to debug and harder to maintain. Same go for integration test they are harder to debug, maintain and perform than unit test. Developpers forget that their time is money and that if they spend time on a test just because it make them feel safe that mean that the project will cost more money.
The general rule I use is simple. When you test, test behavior not code and do your test at the lowest level possible.
The thing that changed my approach to testing, at least in OO languages, was being shown that use cases/tasks/"things the system can do" should be first class objects in the model. At that point, you have a single layer that exposes everything of real value in the application, giving you a really simple test surface. Underneath that surface you can refactor to your heart's delight, without worrying about having to maintain lots of pointless unit tests for trivial behaviours on each object - all that matters is maintaining the end to end functionality. So yes, I agree integration tests are the key, and you can architect your application to make this easier. Not over-testing your underlying model is just good old fashioned information-hiding. Testing the UI on top I leave as a matter of taste.
If a group of reputable programmers got together and published a book about programming which had a single page which only contained this line in a large font, it would be the most useful and valuable programming book ever written.
Even at 100% code coverage it is possible to have missed many code paths that involve nonlocal control flow. The most common issue here would be exceptions in languages that support it. In most operating systems there are various interrupts that a program can also encounter (e.g., a lovely SEGFAULT in many *nix type operating systems) and other operating system and hardware level issues (like OOM killer in Linux or an NMI or ECC memory error in embedded systems).
So, 100% code coverage may not even be enough for some applications as was discussed in another too level thread (medical device) or other hard to fix (spacecraft) or life-threatening situations.
[+] [-] hathawsh|7 years ago|reply
In response to the article: it's true that "you should very rarely have to change tests when you refactor code", however, most of the time, most coders are changing the expected behavior of the code due to changed or added requirements, which is not refactoring. Tests should change when the requirements change, of course. I am not contradicting the article, only clarifying that the quoted statement does not apply outside refactoring. (Refactoring is improving the design or performance of code without changing its required behavior.)
[+] [-] efdee|7 years ago|reply
If you test units in isolation, while adhering to SRP which results in many smaller units depending on eachother to do a task, then simply refactoring without changing behavior screws up a considerable portion of your tests.
As for "tests cast away fear", that is definitely true. Whether or not the lack of fear is warranted is something else, and depends heavily on the quality of the unit tests. I've seen plenty of devs confident of their change because it didn't break any unit tests, only to discover that it broke something they forgot to test.
[+] [-] bigiain|7 years ago|reply
[+] [-] donjoe|7 years ago|reply
Whenever I'm vetting in-/output values to a given set of parameters, I do move from white box (unit tests) to grey box testing.
When I'm done with grey/white box tests, I do make sure integration works as expected.
Why all the hustle of moving through 'the onion'? I wanna make sure to detect misbehaving/unexpected logic as quickly as possible. Searching for malfunction detected while running integration tests takes way more time than already catching them at the onion's most inner layer (unit tests).
[+] [-] munk-a|7 years ago|reply
[+] [-] treve|7 years ago|reply
[+] [-] delecti|7 years ago|reply
[+] [-] duncanfwalker|7 years ago|reply
[+] [-] delusional|7 years ago|reply
[+] [-] gnclmorais|7 years ago|reply
[+] [-] gravypod|7 years ago|reply
I hope many managers and programmers out there don't take this the wrong way. I've been an engineer on a project that was attempting to get 100% code coverage on a piece of software I was writing. I heard constant remarks during this period that were similar to "You don't need 100% code coverage, it doesn't do anything!" These engineers who I was working with had only read articles like this and didn't stop to think about what the article was trying to say. From my experience there is no safe rule of thumb for for how many tests should be implemented for a project. There should be just enough to feel safe (as hathawsh has said). If you're recommending to engineers on your team to stop implementing tests when they say "I'm at 60% coverage and it'll take ~2 days to get to 100%" I'd really hope you take the time to understand why they want 100% coverage.
The software I was working on when my coworkers started telling me to stop writing tests was code designed to trigger alarms when patient's vital signs met certain criteria set by doctors. I am very thankful that I did hit 100% coverage because between 60% and 100% there were many small edge cases that, had they caused a death, I wouldn't have been able to sleep well. Had they said I was explicitly disallowed from working on the tests I would have come in on a weekend (or taking a PTO) and implemented them then. It's our ethical responsibility to know when and where paranoia is worth the marginal time penalty.
[+] [-] bulkan|7 years ago|reply
Most of us aren't working on life or death code like this. My React app doesn't need 100% code coverage but you put it well when you said "There should be just enough to feel safe"
[+] [-] Cpoll|7 years ago|reply
But you're not saying the right words here. 100% doesn't mean you've covered every edge case (although I suppose 60% necessarily means you _haven't_). I can hit 100% without actually asserting anything.
I think it's harmful to talk about 100% without also considering mutation testing, boundary analysis, manual mutation testing...
[+] [-] userbinator|7 years ago|reply
Exactly. You are in the tiny minority writing literally life-and-death code, and for which a lot of other common advice given regarding general software development likely does not apply either (like "move fast and break things".) Also, for your type of application I would probably want 100% state space coverage too.
[+] [-] chriswarbo|7 years ago|reply
Keep adding tests as long as they tell us something new, give us more confidence, document some regression, etc.
Keep adding tests if they're exposing edge cases (e.g. MAX_INT), even if those code paths are already covered.
Keep adding tests if they're asserting useful things about the results, even if those code paths are already covered.
Stop adding tests when you're only trying to make the coverage number increase.
[+] [-] danpalmer|7 years ago|reply
- 0-80% was adding useful tests.
- 80-98% was mostly useless.
- 98-100% found some really interesting bugs and forced refactoring that made testing easier, and was definitely worth it.
[+] [-] konschubert|7 years ago|reply
100% coverage still does not mean 100% of all cases tested - it just means that every line has been run with SOME data and every conditional statement was run once in both directions.
[+] [-] jorblumesea|7 years ago|reply
It also depends on the service. Some services are critical infrastructure and should have their edge cases tested. Some services provide non essential functions and can get away with less.
I agree that there is no magic number, but 100% coverage is clearly overkill for the average engineering team. In fact, broadly demanding any code coverage percentage is probably over simplifying the issue. It just shouldn't be 0 :)
[+] [-] adrianN|7 years ago|reply
[+] [-] _ea1k|7 years ago|reply
The problem with 100% coverage that I think of is that the last 30% is often boilerplate code (generated getters/setters in Java and that kind of thing).
But what if an engineer started with the boilerplate and only then progressed to tests that are actually important? You might get to 30% without testing anything useful at all. Then you might test half of the important stuff and hit an arbitrary metric of 70%.
If someone were ruled by the sonarqube score, they might even do this.
Yeah, thinking about the why matters more than the %.
[+] [-] Too|7 years ago|reply
As others mentioned it also doesn't say anything about state space coverage or input argument space coverage or all combinatorial paths through your code. Only that some line/branch has been hit at least once.
[+] [-] AtlasBarfed|7 years ago|reply
The decidability/halting problem hints that perfect testing is an impossibility for systems of any complexity, as in not just too big of a big-O... it's flat out not possible on a turing machine of any power.
That doesn't mean give up all testing, but there is a LOT of dogma in testing philosophy.
[+] [-] natashas|7 years ago|reply
[+] [-] mmcnl|7 years ago|reply
[+] [-] rorykoehler|7 years ago|reply
[+] [-] calebegg|7 years ago|reply
https://www.nytimes.com/2007/01/28/magazine/28nutritionism.t...
[+] [-] munificent|7 years ago|reply
The fundamental question of automation is: will I repeat this task enough such that the amortized saving outweighs the cost of writing a script to do it?
Whenever you're thinking of adding a test X, quickly consider how often you and your team are likely to need to manually test X if you don't. Also factor in the cost of writing test X (though that's tricky because sometimes you need to build test architecture Y, which lowers the costs of writing many tests...).
If it's a piece of code that's buggy, changing frequently, brittle, or important, then you're likely to need to validate its behavior again and again. It's probably worth writing a unit test for it.
If it's an end user experience that involves a lot of services maintained by different teams and tends to fall apart often, it's probably worth writing an integration test instead of having to keep manually testing every time the app goes down.
If it's an API with lots of important users using it for all sorts of weird edge cases and you don't want to have to manually repro each of the existing weird edge cases any time you add a new one, it's probably worth writing some black box tests for it.
But if it's code that's simple, boring, stable, short-lived, or unimportant, your time may be better spent elsewhere.
Another model is that your tests represent the codified understanding of what the system is. This can be very helpful if you have a lot of team churn. How do you make sure a new person doesn't break something when "something" isn't well-defined somewhere? Tests are a great way to pin that down. If this is a priority, then it becomes more important to write tests for things where the test actually very rarely fails. It's more executable system specification than it is automation.
[+] [-] notacoward|7 years ago|reply
* 100% unit-test coverage is a garbage goal. *
I don't hate unit tests. They can have enormous value shaking out edge cases in a self contained piece of code - usually a "leaf" in the module dependency graph - and making it future proof. Love it. However, unit tests don't tell you anything about whether the module's behavior in combination with others leads to a correct result. It's possible to chain a bunch of unit tests each with 100% coverage together, and still have the combination fail spectacularly. In operations, a lot of bugs live in the interstices between modules.
Even worse, 100% line coverage means a lot less than people seem to think. Let's say one line to compute an expression will compute the wrong value for some inputs. You might have a zillion unit tests that cause that line to be counted as covered, but still be missing the test that reveals the error case. I see this particularly with ASSERT or CHECK types of macros, which get counted toward 100% coverage even though no test ever exercises the failure case.
Striving too hard for 100% unit test coverage often means a lot of work - many of our unit tests have 3x more tricky mock code than actual test code - to get a thoroughly artificial and thus useless result. The cost:benefit ratio is abominable. By all means write thorough unit tests for those modules that have a good ratio, but in general I agree with the admonition to focus more effort on functional/integration tests. In my 30+ years of professional program, that has always been a way to find more bugs with less effort.
[+] [-] jtmarmon|7 years ago|reply
Ironically it's actually easiest to have 100% coverage in worse code, because the more entangled and coupled your code is, the more likely you are to hit branches that are not under test.
[+] [-] munk-a|7 years ago|reply
> Even worse, 100% line coverage means a lot less than people seem to think. Let's say one line to compute an expression will compute the wrong value for some inputs. You might have a zillion unit tests that cause that line to be counted as covered, but still be missing the test that reveals the error case. I see this particularly with ASSERT or CHECK types of macros, which get counted toward 100% coverage even though no test ever exercises the failure case.
I'd rather have 30% test coverage where the lines under test are the complicated ones (not things like factories) with the testing of those lines hitting all the complex edge cases than 100% test coverage that confirms that all your source code is encoded in UTF-8.
[+] [-] P_I_Staker|7 years ago|reply
[+] [-] flukus|7 years ago|reply
There's the developers not understanding the difference between unit and integration tests. Both are fine, but integration tests aren't a good tool to hit corner cases.
Many of the tests don't actually test what they pretend to. A few weeks ago I broke production code that had a test specifically for the case that I broke, but the test didn't catch it because the input was wrong, but the coverage was there.
Most of the tests give no indication of what they're actually testing, or a misleading indication, you have to divine it yourself based on the input data, but most of that is copy/pasted, so much of it isn't actually relevant to the tests (I suspect it was included in the overall test coverage metric).
The results of the code defined the test. They literally ran the code, copied the file output to the "expected" directory and use that in future comparisons. If the files don't match it will open a diff viewer, but a lot of things like order aren't deterministic so the diff gives you no indication of where things went wrong.
Many tests succeed, but for the wrong reason, they check failure cases but don't actually check that the test failed for the right reason.
Some tests "helpers" are actually replicating production code, and the tests are mostly verifying that the helpers work.
Finally, due to some recent changes the tests don't even test production code paths. We can't delete them because it will reduce code coverage, but porting them to actually test new code will take time they aren't sure the want to invest.
/end rant
[+] [-] mikekchar|7 years ago|reply
Imagine that the word "test" is a misnomer, when talking about unit tests. Often people think about testing as a way of checking whether or not the code works properly. This is great for what is known as "acceptance testing". However, as you no doubt agree, it's not so great with "unit testing".
For some reason, people hang on hard to the words "unit" and "test" and come to the conclusion that you should take a piece of code (usually a class), isolate it and show that the class does was it is supposed to. This is a completely reasonable supposition, however in practice it doesn't work that well (I will skip the discussion, because I think you're already in agreement with me on that front).
Instead, imagine that "unit" refers to any piece of code (at any level) that has an interface. Next imagine that "test" means that we will simply document what it does. We don't necessarily worry ourselves about whether it is correct or not (though we wish it to be correct). We just write code that asserts, "When I do X, the result is Y".
At the macro level, we still need to see if the code works. We do this either with automated acceptance tests, or manual testing. Both are fine. When the code works to our level of satisfaction, you can imagine that the "unit tests" (that are only documenting what the code at various levels is doing) are also correct. It is possible that there is some incorrect code that isn't used (which we should delete), or that there are some software errors that cancel each other out (which will be rare). However, once the code is working on a macro scale, in general, it is also working on a micro scale.
Let's say we change the code now. The acceptance tests may fail, but some of the "unit tests" will almost certainly fail (assuming we have full "unit test" coverage). If they don't there is a problem because "unit tests" are describing what the code is doing (the behaviour) and if we change the behaviour, the tests should fail.
For some types of unit testing styles (heavily mocked), often the unit tests will not fail when we change the behaviour. This means the tests, as a long lasting artefact are not particularly useful. It might have been useful for helping you write the code initially, but if the test doesn't fail when you change the behaviour, it's lost utility. Let's make a rule: if the test doesn't fail when the behaviour fail, it's a "bad" test. We need to remove it or replace it with a test that does fail.
The other problem you often run into is that when you change one line of code, 200 tests fail. This means that you spend more time fixing the tests than you gained from being informed that the test failed. Most of the time you know you are changing the behaviour, and so you want to have very little overhead in updating the tests. Let's make another rule: Unit tests must be specific. When you change specific behaviour only a few (on the order of 1) tests should fail.
This last one is really tricky because it means that you have to think hard about the way you write your code. Let's say you have a large function with many branch points in it. If you give it some input, then there are many possible outputs. You write a lot of unit tests. If you then change how one of the branch points are handled, a whole class of tests will fail. This is bad for our rule.
The result of this is that you need to refactor that code so that your functions have a minimum number of branch points (ideally 0 or 1). Additionally, if you split apart that function so that it is now several function, you have to make each of the functions available to your test suite. This exposes rather than hides these interfaces.
The end result is that you decouple the operation of your code. When you hear about TDD being "Test Driven Design", this is what it means. This is especially true for things like global variables (or near global instance variables in large classes). You can't get away with it because if your functions depend on a lot of global state, you end up having tests that depend on that (near) global state. When you change the operation surrounding that state, a whole whack of tests fail.
Committing to writing high coverage unit tests which also have high specificity forces you to write decoupled code that doesn't rely on hidden state. And because it doesn't depend on hidden state, you have to be able to explicitly set up the state in your tests, which force you to write code where the dependencies on the objects are clear and uncomplicated.
You mentioned code coverage. I'm going to say that I almost check code coverage when I'm doing TDD. That's because if you are writing tests that cover all the behaviour, you will have 100% code coverage and 100% branch coverage. However, as you correctly point out, the opposite is not the case. The test for the coverage of your code is not a code coverage tool, it's changing the behaviour of the code and noting that the tests fail.
Most people are familiar with the idea of "Test First" and often equate that with "Test Driven". "Test First" is a great way to learn "Test Driven", but it is not the only way to go. When you have full test coverage, you can easily modify the code and observe how the tests fail. The tests and the production code are two sides of the same coin. When you change one, you must change the other. It's like double entry accounting. By modifying the production code and seeing how the tests fail, you can information on what this code is related to. You no longer need to keep it in you head!
When I have a well tested piece of code and somebody asks me , "How hard is to to do X", I just sketch up some code that grossly does X and take a look to see where the tests fail. This tells me roughly what I'll need to do to accomplish X.
I see I've failed (once again) to keep this post small. Let me leave you with just one more idea. You will recall that earlier I mentioned that in order to have "full coverage" of unit tests with specificity, you need to factor your code into very small pieces and also expose all of the interfaces. You then have a series of "tests" that show you the input for those functions with the corresponding outputs. The inputs represent the initial state of the program and the outputs represents the resultant state. It's a bit like being in the middle of a debugging session and saving that state. When you run the tests, it's like bringing that debugging session back to life. The expectations are simply watch points in the debugger.
When I'm debugging a program with a good suite of unit tests, I never use a debugger. It is dramatically faster to set up the scenario in the tests and see what happens. Often I don't have to do that. I often already have tests that show me the scenario I'm interested in. For example, "Is it possible for this function to return null -- no. OK, my problem isn't here".
Richard Stallman once said that the secret to fixing bugs quickly is to only debug the code that is broken. "Unit tests" allow you to reason about your code. If you have so called unit tests that are unreadable, then you are giving up at least 50% of the value of the test. When I have problems, I spend more time looking at the tests than the production code -- because it helps me reason about the production code more easily.
I will leave you with one (probably not so small caveat). Good "unit testing" and "good TDD" is not for everyone. I talked about ruthlessly decoupling code, simplifying functions to contain single branch points, exposing state, exposing interfaces (privacy is a code smell). There are people for which this is terrible. They like highly coupled code (because it often comes with high cohesion). They like code that depends on global state (because explicitly handling state means having to think hard about how you pass data around). They like large functions with lots of branch points (because it's easier to understand the code as a whole when you have the context together -- i.e. cohesion). Good unit tests and TDD work against that. If you want to write code like the above, I don't think unit tests will work for you.
I personally like this style of programming and I think it is dramatically more productive that many other styles of programming. Not everybody is going to agree. I hope it gives you some insight as to why some people find unit testing and TDD to be very productive, though.
[+] [-] arkh|7 years ago|reply
[+] [-] zestyping|7 years ago|reply
[+] [-] stcredzero|7 years ago|reply
So what happens when you move the practice away from this particular kind of Smalltalk environment? Refactorings in most languages are slower without the Refactoring Browser, and often your Unit Tests effectively double the amount of work involved. The velocity of change slows down. Unit Tests might be less nimble to run. A long compile time might be involved. Given those changes, it makes perfect sense that a larger granularity of tests and fewer tests might be more convenient.
[+] [-] perfunctory|7 years ago|reply
On my current project, for the first time in my long career, I have 100% code coverage. How I achieved it? By ignoring best practices on what constitutes a unit test. My "unit" tests talk to the databases, read and write files etc. I'll take 100% coverage over unit test purity any day of the week.
[+] [-] jb3689|7 years ago|reply
Moreover, tests that don't break are useless
[+] [-] trixie_|7 years ago|reply
Integration test what you can. Unit test core pieces of logic. Avoid mocks.
[+] [-] natashas|7 years ago|reply
[+] [-] vbsteven|7 years ago|reply
I frequently have Test classes which contain a full scenario related to a specific resource or action. For example:
* create resource * lookup resource * modify resource * list all resources * delete resource
The JSON files for the integration tests are then used as examples in the API documentation.
Unit tests are reserved mostly for verifying business logic components. There is no need to setup a complex ControllerTest with mocked out Services or Repositories as you don't care about the internals anyway. Just the input vs output and the integration tests cover those already.
[+] [-] edem|7 years ago|reply
I agree with the part that you should write tests, but I definitely disagree with the part that most of your tests should be integration tests.
As you pointed out the testing pyramid suggests that you should write more unit tests. Why? Because if you have ever tried TDD you know that unit tests make you write good (or at least acceptable) code. The reason for this is that testing bad code is hard. By writing mostly integration tests you lose one of the advantages of unit testing and you sidestep the bad code checking part.
The other reason is that unit tests are easy to write. If you have interfaces for your units of code then mocking is also easy. I recommend stubbing though, I think that if you have to use mocks it is a code smell.
Also the .gif with the man in pieces is a straw man. Just because you have to write at least 1 integration test to check whether the man has not fallen apart is not a valid reason to write mostly integration tests! You can’t test your codebase reliably with them and they are also very costly to write, run and maintain!
The testing pyramid exists for a reason. It is a product of countless hours of research, testing and head scratching. You should introspect your own methods instead and you might arrive at the conclusion that the codebase you are working on is bad and it is hard to unit test, that’s why you have chosen to write mostly integration tests.
[+] [-] 1337shadow|7 years ago|reply
Integration tests should write themselves. It depends on the projects, but personally i use to get 65% of coverage for the price of 5%. For example you want to test a bunch of commands, then you can make a function like autotest('some command', 'some_fixture.txt'): call "some command", capture the output, and write some_fixture.txt with the captured output, and fail complaining that it had to create some_fixture.txt. The next run it will find some_fixture.txt and compare the output and fail only if it differs.
Unit tests should of course be hand written, but to cover everything that matters like for bugs, or for what you want to refactor in TDD. Of course any line that's not covered can potentially break when upgrading versions, but this kind of breaks are likely to be revealed by the 90% of coverage that I think you can get with minimal effort by applying this recipe. Then, you can afford 0day updates when underlying libraries upstream make a new release candidate.
[+] [-] kords|7 years ago|reply
I totally agree with "just stop mocking so much stuff". Most of the time we can use real implementation and by doing this, we'll also increase coverage.
[+] [-] raverbashing|7 years ago|reply
And while it will give some warranties about correctness, it will not fundamentally guarantee the application does what is supposed to do.
Integration tests are certainly better
I can only see the headline being a "great wisdom" to people who have accepted Uncle Bob and TDD crap for too long and without questioning it. Because it is obvious.
Why it is obvious? Because that's how things were done most of the time
When you had 640k of RAM and a C compiler writing unit tests was not impossible, but pretty hard. Testing that your app reacted to inputs and acted correctly was doable and easily automatable. And what wasn't automatable would be tested manually.
Now here comes the "holy highnesses" of testing gurus, gaslighting developers and saying that code with no tests (which tests? automated? unit? the definition is purposefully vague) doesn't work or that the only blessed code is the one that is produced through TDD onanism? No thank you
[+] [-] mpweiher|7 years ago|reply
https://blog.thecodewhisperer.com/permalink/integrated-tests...
I found that argument more convincing, and it also aligns better with my experience.
The point about 100% coverage not being a good goal is pretty solid.
[+] [-] neo2006|7 years ago|reply
[+] [-] thom|7 years ago|reply
[+] [-] jondubois|7 years ago|reply
If a group of reputable programmers got together and published a book about programming which had a single page which only contained this line in a large font, it would be the most useful and valuable programming book ever written.
[+] [-] SomeHacker44|7 years ago|reply
So, 100% code coverage may not even be enough for some applications as was discussed in another too level thread (medical device) or other hard to fix (spacecraft) or life-threatening situations.