We need to talk about testing

[+] dfabulich|4 years ago|reply

"For every complex problem there is an answer that is clear, simple, and wrong." -- H. L. Mencken

The saddest thing about articles like these is that they are so full of wisdom, but they can never seem to compete with the "hot new methodology."

Like it or not, the "agile" movement had a huge impact on the way people do their jobs, in part because it had a name, it had advocates, and it had principles that you could fit on a PowerPoint slide.

But good testing (like good design) just can't be encapsulated in a handful of principles you can teach to a newbie. Good testing requires wisdom.

I can write a lot of words about what I've learned about testing over the years, but it probably won't do a good job of conveying that wisdom to others. The best testers I've ever worked with have a strong gut instinct, developed by experience, for where the bugs will be found, and how to effectively spend our limited time-budget for tests.

Is it worth adding more end-to-end UI tests? What if they run slowly? (How slowly?) Is it better to add more unit tests? What if adding unit tests requires aggressively mocking out dependencies? Which parts of the product require more testing, and which ones have been adequately tested?

These questions don't have quick, easy answers, but the quick, easy answers keep winning mindshare, and we're all impoverished as a result.

[+] beebeepka|4 years ago|reply

Well said. But wisdom goes hand to hand with dedication. Let me paraphrase someone wiser than me:

"Dedication brings wisdom; lack of dedication leaves ignorance. Know what leads you forward and what holds you back and choose the path that leads to wisdom."

[+] User23|4 years ago|reply

> Good testing requires wisdom.

I don’t disagree, but I’d be willing to settle for diligence and discipline.

[+] lmm|4 years ago|reply

Writing an article claiming that "it's complex" and "it depends" is the easiest thing in the world; even when these claims are false, few will dare call them out as false.

In my experience good testing actually doesn't require wisdom, or at least there's a lot more value in quick easy principles than there is in wisdom (which is very hard to assess whether it's actually achieving anything).

We have plenty of articles that pontificate and cover everything in shades of grey; actually committing to a stance and giving concrete, actionable advice (that may be wrong for occasional edge cases) is far more valuable.

(To answer your questions: have half an hour's worth of tests, start with purely end-to-end UI tests and convert cases to more unitlike tests as necessary to keep that half hour runtime. Don't ever mock, stub if you must but only after you've tried improving the code so that you don't have to. You already know which parts require more testing and which have been adequately tested, give yourself permission to trust your instincts on that one)

[+] keithnz|4 years ago|reply

I like this joke from a while ago about testing "QA Engineer walks into a bar and he orders a beer. Orders 0 beers. Orders 99999999999 beers. Orders a lizard. Orders -1 beers. Orders a ueicbksjdhd. First real customer walks in and asks where the bathroom is. The bar bursts into flames, killing everyone."

Which leads into being very clear about what you want to achieve with testing, correctness, robustness, fit for purpose etc, and be clear how much effort you want to put into each area. Often I have seen people put a lot of effort into testing things that really doesn't have much payback compared to what they could be testing instead. So be careful of the opportunity costs of your testing efforts.

[+] ohazi|4 years ago|reply

> Often I have seen people put a lot of effort into testing things that really doesn't have much payback compared to what they could be testing instead.

90% of the "unit tests" that I've observed in the wild are checking for things that a modern type system would easily prevent you from doing.

The "unit testing is a magic bullet" cults seem form in environments that use weakly typed or highly dynamic languages like Javascript that let you pass anything to anything and only blow up when you execute one particular branch at runtime.

A good reason to use Typescript, Rust, Python's optional type hints, etc. is that they point out these problems for you as you're writing your code, so you don't have to unravel your mess three days later as you're cranking out 10 pages of boilerplate unit tests that only cover two functions and aren't even close to being exhaustive.

Use better languages, stop wasting your time testing for typos and brain farts, and focus on testing higher-level aspects of your design that your language and tools can't possibly know about.

[+] yourenotsmart|4 years ago|reply

Unit testing has become a bit like a diploma. Something society requires to acknowledge your quality, but that is mostly pro-forma and very poorly correlated with domain skills or intelligence.

Those are systems that have firmly crossed from "simulation" territory to "simulacra" territory.

People who think about the actual value of specific tests like you are few and far between. Most feel constrained by peer pressure and the need to do what they perceive is correct by definition.

[+] DelightOne|4 years ago|reply

> Often I have seen people put a lot of effort into testing things that really doesn't have much payback compared to what they could be testing instead.

Do you have examples of this? What kind of tests do you use and to test what? I've seen people testing _literally everything_ and some only the happy path, failures and critical units like user input/protocol assumptions/algorithms.

[+] dkersten|4 years ago|reply

I've always felt the best approach for (automated) testing is:

Unit test style tests should test the specification. That is, you test that good input creates the results that the specification states it should and you test that the known failure cases that the specification says should be handled are, indeed, handled as the specification states. This means that most of the tests are generally happy path tests with some testing the boundaries and known bad cases. The goal is not to find bugs, but to prove in code that the implementation does in fact meet the requirements as set out in the specification.

Regression tests. Any time a bug is found, add a failing test to reproduce the code. Fix the bug and the test should pass. This both proves that the bug is actually fixed and prevents the bug from creeping back in again later. Again, the goal is not to find bugs, its to prevent reoccurrence and is completely reactionary.

Finally, property-based generative testing. Define properties and invariants of your code and schemas of your data, then generate random test cases (fuzzing essentially). Run these on a regular basis (overnight) and make sure that the properties always hold, for good input data, and that error states are handled correctly, for bad input data. You can also apply this to the overall system, by simulating network failures between docker containers [1]. The goal of this is to find bugs, since it will test things you won't have thought of, but since it generates random test cases, you aren't guaranteed that it will find anything. Its also notoriously hard to write these tests and come up with good properties. I don't often do these tests, only when the payoff vs effort seems worth it. Eg for a low impact system, its not worth the effort, but for something that's critical, it may be.

For human QA, I think it makes most sense to test the workflows that users do. Maybe sure everything works as expected, make sure the workflow isn't obtuse or awkward, make sure everything visually looks ok and isn't annoyingly slow. Test common mistakes due to user error. Stuff like that. I don't think we can expect this to be thorough and its unrealistic to think that it will find many bugs, just that it will make sure that most users experiences will be as designed.

So, test for what you expect (to prove that its what you expect), test known bugs to prevent regression and to prove that its fixed. Then, only if your software is critical enough to warrant the effort, use property-based testing as necessary to try and weed out actual bugs. Most software can skip that though.

[1] For example https://github.com/agladkowski/docker-network-failure-simula... or https://bemowski.github.io/docker-iptables/ I've personally successfully used https://github.com/IG-Group/Havoc to test a fault tolerant distributed system I worked on a few years ago, using Clojure's generative testing to create test cases and failure scenarios and Havoc to handle injecting of network errors.

[+] incadenza|4 years ago|reply

I don’t mean to be too critical, but this article is part of a larger trend. I can nod along with all of its central claims and walk away with exactly zero actionable advice or practical path toward integrating this into my teams workflow.

[+] deurruti|4 years ago|reply

I would agree with you and say while there is no set advice or plan to follow (that would make things all too easy) there are a couple things here that we could all apply to our teams.

> Tracking automated test coverage (unit, integration, ui) is a performative task that doesn’t provide hard evidence to increase confidence with stakeholders. Instead we should shift automated testing from functional to compliant, accessible, and security based testing and track coverage there.

> Shift testing to the left. This has been a major problem in the organizations I’ve worked at and we should continue to keep a close eye on. Establishing processes to get QA as early as possible into architecture reviews, design reviews and other early processes that tend to only be dev, product, and design focused.

> Continue to build up our embedded QA unit to be sources of insight for multiple stakeholders and provide domain knowledge for our products. As QA we should always be asking two questions are we building the correct product? And are we building it correctly?

[+] ipnon|4 years ago|reply

This speaks to the artfulness of testing, as a discipline of programming in general. There is no science of testing. Testing well is a skill learned over decades, as in the case of the author.

Testing is a fraught activity, too often leaving stakeholders without confidence, and leaving programmers feeling like they are just going through the motions. Yet there is clearly some value to testing; who prefers untested code to the tested?

Our lack of definitive answers regarding how to best test should not discourage us from testing. We should instead appreciate the inexactness of good testing, and seek to develop a fine sensitivity for how to test our software well.

[+] jeffreygoesto|4 years ago|reply

What would you expect as actionable advice? The software world is so diverse, how could a reasonably sized article cover all those needs?

The best I can think of would be: regression tests are ok and mean "don't you ever do _that_ again". But they are not sufficient to catch the funny ways your customer will use the software. For that you need some people striking a balance using it in new but realistic enough ways.

[+] bluGill|4 years ago|reply

Anytime someone creates actionable testing advice I automate it. However that still isn't enough to ensure quality so I need humans to find the bugs that actionable advice didn't

[+] mLuby|4 years ago|reply

Can't stand this headline format "We need to talk about X" because it implies the author speaks for the group rather than to the group, and to my ear it always carries a condescending tone.

See also "We're all [person] right now."

[+] kbelder|4 years ago|reply

Yeah, I haven't read the article, because that headline format is one of those that I've decided to never reward with a click.

[+] Sevii|4 years ago|reply

My initial thought on this was that the negative changes the author describes in the post were driven by continuous delivery pipelines. But agile delivery practices have a similar effect of increasing the frequency of delivering software.

If you have a system to deliver software updates in one hour, you will use it to ship 'just in time' updates. Structurally it is hard to argue against fixing the customer pain when you can easily deploy a fix.

But the long term effect of having CD is that testers get pushed out of the system.

If we deliver code everyday we can no longer afford to spend 3 days testing each release. If we can't spend 3 days testing for defects in the entire product, we end up only testing a small amount of the actual product on each release.

If we deliver code multiple times per day we probably cannot afford to do 'any' manual testing.

CI/CD encourages spending as little time as possible on testing. If deploying code takes 10 minutes, why let testing make it a 3 day process?

The long run effect is that the quality of software decreases in general. Bugs are continuously deployed into the system and you never reach a state of doneness.

[+] shkkmo|4 years ago|reply

They way I've run CD is we had continous deployment to a staging environment where final testing and stakeholder review happens, then periodic deployments to production.

This works best when you do a good job of getting the bigger/riskier changes in earlier and do a good job scheduling other changes around anything that is time sensitive.

[+] pjmlp|4 years ago|reply

I have had CI/CD pipelines, in offshored projects, where the amount of issues that would creep in was unberable, despite unit tests and whatever best practices.

So for the teams to have anything at the end of the CI/CD that could be either demoed, or tested, multiple stages integration stages of green builds were introduced.

A full multi-stage CI/CD integration pipeline from a devs machine, into what became known as diamant build, would take about a day, assuming all in-between-stages were green on first build.

[+] adevx|4 years ago|reply

I think this article has some really good points, especially "The purpose of testing is to increase confidence for stakeholders through evidence" resonates with me.

I'm personally a huge proponent of End 2 End Testing. You can think you covered everything in isolation, but the user is presented with bugs nonetheless. I have near 100% E2E test coverage. Yes it takes time to run, yes third-party integrations can result in failed tests. (Which probably signal an issue with you or your third-party anyway). I believe a full E2E test requires less maintenance and has a broader coverage than Unit Tests.

For instance I run TestCafe E2E tests that tests all possible user interaction from signup and paying for a subscription to account termination. I go as far as reading expected emails through a web email client and verify the content in the received email using TestCafe. I test cache invalidation, a host of automated member expiration, warning and authentication / security scenarios. All this should be done anyway, why not do it using E2E tests. I did a rewrite, switching from Vue.JS/SSR to React on Next.js I had only minimal code changes to my E2E tests. And with minimal I mean a couple of ID/Class DOM references to account for different components in Vue vs React.

Not only do you test code paths, but also all your components in the stack and the interaction between them, which for me are Node.js, Nginx, PostgreSQL, Redis, Amazon SES, etc. If something fails, you will have to do some digging, more so than a failed Unit Test. But more often than not you know what has recently changed and so where to look for possible issues. Your E2E tool makes a screenshot of the browser fail-state often highlighting the issue at hand.

Of course these types of tests work best on websites, and not so great on say low level GPU driver code. They may also become to cumbersome and slow on highly complex sites. The biggest drawback is probably the time it takes for a full test run which in my case can take up to 20 minutes in a headless browser.

[+] omeze|4 years ago|reply

I 100% agree with the central thesis that tests are about evidence-based correctness.

I think keeping this in mind is important as a product and codebase scales since a lot of testing approaches shift in importance over time. The articles sidebar about TDD is a micro-example of this, but there are ones on the order of months/quarters/years that Ive come across as well.

One example is how you likely only want end to end and integration tests early on in a development lifecycle, rather than excessive subsystem tests and mocks. This is because your end to end product experience is likely to change less than the subsystems that implement specific features (eg user auth). Over time, as your subsystems solidify you’ll likely want to reprioritize and keep end to end tests lower in proportion since they take longer to execute. In other words, your test suite should be proportional to the expected requirement changes over time.

Another more out-there example is the disabling of tests over time as test coverage starts to overlap. A lot of tests in a large codebase exercise the same codepaths, and this is usually not a net benefit (it sometimes can be, eg if one test runs really fast vs another more comprehensive one thats slow). So having a way to evaluate whether a test is actionable or useful is something that emerges organically as large codebases begin to change.

[+] rhdunn|4 years ago|reply

I also personally like to have hierarchies of tests, where higher-level tests don't need to test specific details covered by the lower-level tests like edge cases. Take implementing a language (lexer, parser, static analysis, etc.) for an IDE or compiler.

For the lexer, I have the tests cover the different EBNF fixed string tokens (keywords and symbols) in a given token, along with the more complex tokens for numbers, etc. For the more complex tokens I have lexer tests covering the different valid number forms, identifier characters, etc. This leads to overlap in things like keywords, but not an overlap in implementing different specifications (e.g. CSS modules).

For the parser tests, because I have the lexical coverage, I can just use a representative example of a number, etc. That allows the parser tests to focus on the different paths (optional parts, loops, etc.) of the EBNF and things like error recovery when different symbols are missing. With these, I validate the AST is correct, including the AST class that implements that node in the tree.

For the AST tests, because I have the lexer and parser coverage, I can focus on the data model and how the information in the parse tree is exposed to the AST data model.

For the higher-level tests (resolving variables to their declarations, etc.) I can focus on the cases relevant to that level instead of having to also test combinations of the lower-level tests.

[+] wallacoloo|4 years ago|reply

> Another more out-there example is the disabling of tests over time as test coverage starts to overlap. A lot of tests in a large codebase exercise the same codepaths, and this is usually not a net benefit (it sometimes can be, eg if one test runs really fast vs another more comprehensive one thats slow). So having a way to evaluate whether a test is actionable or useful is something that emerges organically as large codebases begin to change.

If the cost of maintaining these tests exceeds the value they provide, don’t just disable them: delete them. But if the maintenance isn’t an issue, and it’s just the compute resources that are an issue, keep them enabled and schedule your tests more effectively! For example, at my company we have tests which are run the moment you want to commit, and will block your commit if they fail, and then we have a separate service which continually runs tests against tip 24/7. In order to keep low friction in the dev process, we only run a subset of tests as part of the commit process, and assume that a few failures will make it into tip but will be caught within a couple days (we ship every two weeks so that timing’s been pretty safe).

[+] geofft|4 years ago|reply

The idea that the tests of TDD are not the same thing as the tests of test coverage is a great insight that I don't think I'd quite realized before. In my experience, that seems in fact accurate - writing a short tool that uses the code you want to write is a good way to break down the problem and also make sure your solution actually works, but it's different from having confidence in the system.

On the other hand, I'm a little confused at the idea of non-automated tests that provide value. If you're interested in security/compliance/accessibility/etc., those seem like general software quality things, but they don't seem like "tests." To me, a "test" is a thing that either passes or fails, by analogy to tests in education, tests in law, etc. It's definitely valuable to review code for security issues or ensure that you're taking accessibility into account, but unless you're asking a question that can produce a yes or no, then you're not increasing stakeholder confidence through evidence, per the article's formulation (which I agree with). If you think there's value in manual testing because a human will provide you with "insights and feedback," great, but if you do hallway "testing" and everyone successfully completes the task but provides no feedback, is that a failure? If you can't find any security bugs from reading the code, does that mean there are none?

If you want unstructured information from humans as part of inputs to your design, great (and that's a great reason to shift feedback left), but I think that's a different category from testing.

You can't automate everything but you can automate a whole lot. Static analysis tools, better languages, modeling tools, etc. can find or prevent security problems more reliably than a human. Linters can check for accessibility issues (missing alt tags, missing ARIA attributes, etc.), and you can write integration tests that try to drive your entire product using only accessibility APIs to make sure your workflows are covered. Automated tests can look at emitted logs and metrics and make sure they're emitting what you expect and not emitting what you don't. And so forth.

[+] unknown|4 years ago|reply

[deleted]

[+] unknown|4 years ago|reply

[deleted]

[+] cushychicken|4 years ago|reply

The purpose of testing is to increase confidence for stakeholders through evidence

This is a good, succinct description of why you should write tests.

Though, as a solo developer who's considered sinking some time into writing tests for Report Card Writer, I've chosen not to. No amount of testing will help me make sales, which is my primary problem at the moment.

[+] scottmcdot|4 years ago|reply

Does anyone have experience in testing "data science workflows". I made up that term because I can't think of anything better to call it. For example, a programmer starts off with some business requirements (that would be written by a business analyst) that might cover rules on which customers should receive marketing newsletters. By SQL wrangling a few different databases using some pretty complicated logic, the programmer arrives at a list of customers that they feel satisfies the business analyst's requirements. Would a tester test that the programmer's code "works" by "unit testing" or should a tester have the knowledge and skills to build their own "independent version" of what the programmer has built and compare results?

[+] nerdponx|4 years ago|reply

I don't have a real "answer", but in my experience, being able to arrive at the same solution two different ways definitely helps increase confidence.

It also helps to have somebody with domain experience who is able to validate that the results makes sense. This isn't sensible in a "software development" workflow, but for one-off data analysis and data set generation tasks it's worth doing for complicated queries.

[+] haolez|4 years ago|reply

I've been leaning more towards extensive tracing and automatic error-based rollbacks than writing lots of tests. It seems to make more sense with the tools available today.

[+] nlstitch|4 years ago|reply

Agreed! Rollbacks are easily explainable, show commitment to safeguarding the user experience and make your business users way more happy.

[+] rualca|4 years ago|reply

> It seems to make more sense with the tools available today.

Aren't unit test frameworks ubiquitous, and involve low-effort and low turn-around time that can even run completely on a developer's desktop? I mean, why an overly complex system like tracing replace simple standalone tests?

[+] he0001|4 years ago|reply

I’ve been working with teams with testers and with teams completely without them. My experience is that there are always bugs and about the same amount of them. It seems like it never matters how much effort you put in. I think it’s had to be about ownership and who is going to have to wake up in the morning to fix things.

One thing that I believe that’s also left out here is that Test-driven developed code looks different than code that isn’t. All the TDD code is per definition testable but other code isn’t always testable. (Mind you, I say code here and not programs.)

I also feel like there’s a missing point in the article, which is how easy something is to test. If it’s not easy to test something or setup a test for it it will probably never happen. I think you should write systems to be easily testable since then people will do actual testing. And if you don’t test that continuously (that it is testable) you will end up in a system that is eventually harder to test. Automated tests will not happen for something that isn’t easy to test and automated tests are kind of a way to acknowledge that “well, at least this works”.

[+] schwartzworld|4 years ago|reply

> If it’s not easy to test something or setup a test for it it will probably never happen

Absolutely right. And it's never easier to set up a test than when the code is fresh in your mind.

I'm also the opinion that bug fixes, and especially regression fixes, should always have tests included. If you've tested your code in the first place, adding an extra test case is really easy. It's much worse when you realize that the author of the code (even if it's past-you) couldn't figure out how to test it, and now that's your problem.

[+] ChrisMarshallNY|4 years ago|reply

I'm a huge believer in testing -both unit and "monkey."

A lot of my stuff has both. If it is heavily UI-centric, or involves device interfaces and/or communications, I tend to use test harnesses.

Unit testing is good. Combined with code coverage tools, it can be quite useful.

But having unit tests is not any kind of quality assurance. It means that the "low-hanging fruit" has been picked.

Test harnesses and "monkey testing" require a lot of discipline, but are often well worth it.

I write about the approach I take, here: https://littlegreenviper.com/miscellany/testing-harness-vs-u...

[+] rualca|4 years ago|reply

Unit tests were never considered a silver bullet. The only thing they do is automatically run specific sanity checks on small subcomponents with everything else faked out.

At best, they are a convenient way to ensure that if an invariant suddenly varies unexpectedly and in an unplanned way... A red flag is thrown into play.

[+] andreimackenzie|4 years ago|reply

I'm happy to read encouragement to consider the wider group of stakeholders in the scope of testing. In many organizations, incentives are out of alignment to find issues beyond the obvious user-facing correctness ones, so "testing theater" wins, because it allows teams to declare victory more quickly. Software as a discipline would benefit from more testing attention to security, accessibility, usability, etc.

[+] svilen_dobrev|4 years ago|reply

i don't see the Maintainability bad-thing: software Does and Has-to change. So testing is there also to allow one of the main stakeholders - the software-makers-themselves - to operate in some confidence.. Which, i agree, may not be part of the shipped software per-se.. but of the process of software makeing

[+] js8|4 years ago|reply

I don't think you can automate all the testing, since almost by definition, you always test AGAINST something. That something can be a specification, or a different implementation, or just a clock, but it has to come outside the subject (program) being tested.

A different take is that we want tests to have two qualities - correctness (the test cases should match the desired program behavior) and comprehensiveness (the test cases should cover as much of behavior as possible). The only way these are both 100% attained is if the source of your test cases is effectively another implementation of the same specifications, and you compare the behavior of the two.

The comparison itself can be automated, but the creation of the other implementation cannot. So, logically, if you want to automate it anyway, you have to compromise one of the two qualities, either being less correct (be less strict when comparing program output) or less comprehensive (verify less possible inputs).

So it seems to me that the "automate all the tests" folks want to have a cake and eat it too. They want to have an automated comprehensive and correct test suite, without the effort of writing an approximation of another implementation (used to compare).

In the past (I love how the blog post says "I am not advocating to returning to the Dark Ages of software delivery", as if that was necessarily terrible), you had a QA team to do exactly that - approximate another implementation of the same program directly from the specs. The better the approximation, the better the verification. But the cost is duplication of effort, in some sense. If you try to remove the duplication (i.e. for example by having the same person create both at the same time), you're likely to compromise one of the two qualities, without realizing it.

Let me state the above yet differently. The "let's automate testing" approach is based on the assumption that human tester is running the same tests over and over. But that's not the case, the manual testing is actually different each time, so what you invisibly lose by automation is comprehensiveness.

In fact, the QA's job in the past was to have another person (other than the developer) trying to make the sense of the specification (and presumably approximate the implementation with the test design), and check if the specification was translated correctly by running a comparison of their understanding to the developers'. While the comparison itself can be automated, the another look is important to discover the parts of the specification are actually weak, and can be understood differently. I don't think testing a program is solely about its intrinsic properties, but rather about checking the correctness of the translation from specification to the executable code.

[+] postalrat|4 years ago|reply

I think if we have an automated way to check two implementations it would beat most every other form of testing that exists.

Then testing would come down to simply write everything twice and hope you got it right at least once.

[+] bedobi|4 years ago|reply

I'm irrationally passionate about testing. Shameless plug: (rendered version at https://gist.github.com/androidfred/501d276c7dc26a5db09e893b...)

# Test your tests

## Summary: test interfaces, not implementations

Tests that pass when something is actually broken, and fail when things actually do work, are worse than useless- they are positively harmful.

## The requirement

Let's say we've been tasked with returning `400` when `GET` `/users/<userId>` is called with a negative `userId`.

## The test

The requirement can be turned into a test that hits the endpoint with a negative `userId` and checks that a `400` is returned:

```java @Test public void getUser_InvalidUserId_400() { expect().statusCode(400).when().get("/users/-1"); } ```

## Implementation

A ubiquitous style of implementation may look something like this: (ignore whether you like the style of implementation or not, it's just an example)

```java public class UserResource {

    @Inject
    private UserService userService;
    
    @GET
    @Path("/users/{userId}")
    public Response getUser(@PathParam("userId") final Long userId) {
        try {
            return Response.ok(userService.findById(userId)).build();
        } catch (final UserException e) {
            return Response.status(e.getErrorCode()).build();
        }
    }

} ```

```java public UserService {

    @Inject
    private UserDao userDao;

    public User findById(final Long userId) throws UserException {
        if ((userId == null) || (userId <= 0)) {
            throw new UserException("Invalid arguments", 400);
        }
        return userDao.findById(userId);
    }

}

```

## Another test

Everyone knows a test hitting the endpoint is not enough- more tests are required. A ubiquitous style of additional test may look something like this:

```java public class UserResourceTest {

    @InjectMocks
    private UserResource userResource = new UserResource();

    @Mock
    private UserService userService;

    @Test
    public void getUser_InvalidParams_400() throws UserException {
        doThrow(new UserException(INVALID_ARGUMENTS, 400)).when(userService).findById(-1L);
        assertThat(userResource.getUser(-1L).getStatus(), is(equalTo(400)));
    }

} ```

## Test the tests

### Break something

Let's say the negative `userId` check is removed from the `UserService`. (by mistake or whatever)

The test that hits the endpoint will fail, because it will no longer get a `400` when `GET` `/users/<userId>` is called with a negative `userId`. This is exactly what we want out of a test!

The additional test however will pretend the negative `userId` check is still in the service and pass. *This is literally the exact opposite of what we want out of a test!*

### Refactor something

Or, instead, let's say the `UserResource` is refactored to use a `NewBetterUserService`. (the `NewBetterUserService` still throws an exception on negative `userId`)

The test that hits the endpoint will pass, because it will still get a `400` when `GET` `/users/<userId>` is called with a negative `userId`. This is exactly what we want out of a test!

The additional test however will fail because it expects the `UserResource` to call the (old) `UserService`. *This is literally the exact opposite of what we want out of a test!*

### But... The `UserServiceTest` would fail

Maybe there's a `UserServiceTest` that would fail if the negative `userId` check is removed from the `UserService`:

```java public class UserServiceTest {

    @Test
    public void findById_InvalidParams_400() throws UserException {
        expectedException.expect(UserException.class);
        expectedException.expectMessage("Invalid arguments");
        assertThat(expectedException.errorCode(), is(equalTo(400)));
        
        userService.findById(-1L);
    }

} ```

It's irrelevant. The `UserResourceTest` test is still a bad test, because it fails on refactoring. And even if there is a `UserServiceTest` that fails if the negative `userId` check is removed from the `UserService`, the `UserResourceTest` doesn't fail if there is no such `UserServiceTest`, or if there is but it's incorrectly implemented etc. It just pretends the negative `userId` check is in the `UserService` and passes.

If you were absolutely adamant about testing the implementation (which you shouldn't be, because implementation tests fail on refactoring, making them bad tests), the "correct" way of testing that would be:

```java public class UserResourceTest {

    private UserResource userResource = new UserResource(new UserService()); //not a mock

    @Test
    public void getUser_InvalidParams_400() throws UserException {
        assertThat(userResource.getUser(-1L).getStatus(), is(equalTo(400)));
    }

} ```

Because now, if the negative `userId` check is removed from the `UserService` the test will fail. But since it also fails on refactoring, it's still a bad test.

### But... Mocks are useful

Yes, mocks are useful, and they do have a place. Eg, instead of connecting to a real db, the DAO methods could be mocked, and REST calls to other services etc could be mocked too. They're external dependencies not part of the core of the system under test. The case made here isn't that mocks are never useful or always bad, it's that mock testing is often used excessively for internals of the system under test, which doesn't make sense.

Also, I'd still argue that, by default, there should still be a strong preference for, rather than mocking DAO calls to db and REST calls to other services, actually spinning up a real in memory db (eg [https://github.com/vorburger/MariaDB4j](https://github.com/v... lets you do this), and actually WireMocking calls to other services. Then, more of the system under test is being tested, with less effort.

Ie unlike with regular mock testing, it's verified that the app doesn't just call a DAO method but also that DAO method actually connects to the db, runs the expected query and returns the expected result based on the primed content in the db, and that the app actually hits the configured external service url with the expected request and handles the primed response based on the Wiremock stub etc etc.

While such tests take a bit longer to run, you write far fewer, higher quality, less brittle tests, because you're not having to write and maintain `nn` mock tests between every layer of your app internals to check every single call on the way through the stack. You can freely refactor literally all* the internals all you want without having to change the tests at all - they will keep passing as long as everything works and they will fail if something doesn't. (which, again, is exactly what you want)

## Terminology

The test that hits the endpoint is commonly referred to as a "integration test", and the other test is commonly referred to as a "unit test".

Kent Beck (the originator of test-driven development) defines a unit test as "a test that runs in isolation from other tests". This is very different from the definition of a unit test as "a test that tests a class/method in isolation from other classes/methods". *The test that hits the endpoint satisfies the original definition.*

But it doesn't really matter if you want to call a given test an "integration" test or a "unit" test. The point is the test fails when something breaks and passes when something is improved. If it does the opposite, it's not a good test.

## More on the topic

* http://googletesting.blogspot.com.au/2013/08/testing-on-toil...

* http://codebetter.com/iancooper/2011/10/06/avoid-testing-imp...

## Misc

There are other problems with the example, such as eg

* Primitive Obsession: using a Long to represent `userId` and using procedural inline checks in the service to check for negative `userId`. Instead, create a class `UserId` that encapsulates and enforces its own checks.

* Hidden Dependencies: `UserResource` has a hidden dependency on `UserService` and `UserService` has a hidden dependency on `UserDao`. Dependency Injection frameworks like Spring encourage such hidden dependencies, and the excessive use of mocks in unit tests.

* The service shouldn't know about HTTP error codes.

These are simplifications in order to keep the example short - the scope of this article is testing.

[+] omeze|4 years ago|reply

Highly recommend publishing this as a blog post and sharing as a post on HN so it isnt lost to the ether - hard to consume this as a comment but theres good nuggets in there

[+] geofft|4 years ago|reply

This seems a little bit disconnected from the article but it would be a good submission on its own I think!

72 comments