The big TDD misunderstanding (2022)

[+] imiric|2 years ago|reply

I also wasn't aware that "unit" referred to an isolated test, not to the SUT. I usually distinguish tests by their relative level, since "unit" can be arbitrary and bring up endless discussions about what it actually means. So low-level tests are those that test a single method or class, and integration and E2E tests confirm the functionality at a higher level.

I disagree with the premise that "unit", or low-level tests, are not useful because they test the implementation. These are the tests that check every single branch in the code, every possible happy and sad path, use invalid inputs, etc. The reason they're so useful is because they should a) run very quickly, and b) not require any external state or setup, i.e. the traditional "unit". This does lead to a lot of work maintaining them whenever the implementation changes, but this is a necessary chore because of the value they provide. If I'm only relying on high-level integration and E2E tests, because there's much fewer of them and they are slower and more expensive to run, I might miss a low-level bug that is only manifested under very specific conditions.

This is why I still think that the traditional test pyramid is the best model to follow. Every new school of thought since then is a reaction towards the chore of maintaining "unit" tests. Yet I think we can all agree that projects like SQLite are much better for having very high testing standards[1]. I'm not saying that every project needs to do the same, but we can certainly follow their lead and aspire to that goal.

[1]: https://www.sqlite.org/testing.html

[+] vmenge|2 years ago|reply

I've never had issues with integration tests running with real databases -- they never felt slow or incurred any significant amount of time for me.

I also don't think unit tests bring as much value as integration tests. In fact, a lot of times unit tests are IMO useless or just make your code harder to change. The more towards testing implementation the worse it gets IMO, unless I really really care that something is done in a very peculiar way, which is not very often.

My opinion will be of course biased by my past experiences, but this has worked well for me so far with both monoliths and microservices, from e-shops and real estate marketplaces to IoTs.

[+] magicalhippo|2 years ago|reply

I think it depends on what exactly the code does.

We have some custom rounding routines (to ensure consistent results). That's the kind of stuff you want to have lots and lots of unit tests for, testing all the paths, edge cases and so on.

We also have a complex price calculation module, which depends on lots of tables stored in the DB as well as some fixed logic to do its job. Sure we could test all the individual pieces of code, but like Lego pieces it's how you put them together that matters so IMO integration testing is more useful.

So we do a mix. We have low-level unit testing for low-level library style code, and focus more on integration testing for higher-level modules and business logic.

[+] j1elo|2 years ago|reply

I believe the recent-ish reactions against the chore of maintaining the most lower level of unit tests, is because with years and experience we might be going through an industry tendency where we collectively learn that those chores are not worth it.

100% code coverage is a red herring.

If you're in essence testing things that are part of the private implementation, only through indirect second effects of the public surface... then I'd say you went too far.

What you want to do is to know that the system functions as it should. "I might miss a low-level bug that is only manifested under very specific conditions." means to me that there's a whole-system condition that it's possible to occur and thus should be added to the higher level tests.

Not that lower level unit tests are not useful, but I'd say only for intricate and isolated pieces of code that are difficult to verify. Otherwise, most software is a changing entity because we tend to not know what we actually want out of it, thus its lower level details tend to evolve a lot over time, and we shouldn't have two implementations of it (first one the code itself, second one a myriad tiny tests tightly coupled to the former)

[+] vidarh|2 years ago|reply

To me, unit tests primary value is in libraries or components where you want confidence before you build on top of them.

You can sidestep them in favour of higher level tests when the only place they're being used is in one single component you control.

But once you start wanting to reuse a piece of code with confidence across components, unit tests become more and more important. Same as more people are involved.

Often the natural time to fill in lacking unit tests is as an alternative to ad hoc debugging.

[+] DavidWoof|2 years ago|reply

> I also wasn't aware that "unit" referred to an isolated test

It never did. "Unit test" in programming has always had the meaning it does now: it's a test of a unit of code.

But "unit test" was originally used in electronics, and the meaning in electronics was a bit closer to what the author suggests. The author is being a bit fanciful (aka lying) by excluding this context and pretending that we all don't really understand what Kent Beck et. al. were talking about.

[+] drewcoo|2 years ago|reply

> I also wasn't aware that "unit" referred to an isolated test, not to the SUT.

I'm with you. That claim is unsubstantiated. It seems to trace to the belief that the first unit tests were XUnit family, thus were SUnit for Scheme. But Kent Beck made it pretty clear that SUnit "units" were classes.

https://web.archive.org/web/20150315073817/http://www.xprogr...

There were unit tests before that. SUnit took its name from common parlance, not vice versa. It was a strange naming convention, given that the unit testing framework could be used to test anything and not just units. Much like the slightly older Test Anything Protocol (TAP) could.

> [on unit tests] This does lead to a lot of work maintaining them whenever the implementation changes, but this is a necessary chore because of the value they provide.

I disagree. Unit tests can still be behavioral. Then they change whenever the behavior changes. They should still work with a mere implementation change.

> This is why I still think that the traditional test pyramid is the best model to follow.

I'll disagree a little with that, too. I think a newer test pyramid that uses contract testing to verify integrations is better. The notion of contract tests is much newer than the pyramids and, properly applied, can speed up feedback by orders of magnitude while also cutting debugging time and maintenance by orders of magnitude.

On that front, I love what Pact is doing and would like to see more competition in the area. Hottest thing in testing since Cypress/Playwright . . .

https://pact.io

[+] janosdebugs|2 years ago|reply

Genuine question: can somebody please explain why there needs to be a distinction between "true" unit tests and tests that work on several layers at once as long as said tests are runnable <1min on a consumer-grade laptop without any prior setup apart from a standard language + container env setup?

Over the years I had several discussion to that effect and I truly, genuinely don't understand. I have test cases that test a connector to, say, Minio, so I spin up a Minio container dynamically for each test case. I need to test an algorithm, so I isolate its dependencies amd test it.

Shouldn't the point be that the thing is tested with the best tool available for the job that ensures robustness in the face of change rather than riding on semantics?

[+] emmelaich|2 years ago|reply

Wow that's interesting, because I never even considered that a unit test to be other than a test to a small unit.

Is it not right there in the name?

[+] melvinroest|2 years ago|reply

Having high testing standards means practically to me (having worked for a few SaaS companies): change code somewhere else, and see where it fails elsewhere. Though, I see failing tests as guidelines as nothing is 100% tested. If you don't see them as guidelines but as absolute, then you'll get those back in bugs via Zendesk.

[+] WolfOliver|2 years ago|reply

It make sense to write a test for a class when the class/method does complex calculations. Today this is less the case then it was when the test pyramid was introduced.

[+] PH95VuimJjqBqy|2 years ago|reply

> I also wasn't aware that "unit" referred to an isolated test, not to the SUT.

What the hell are they teaching people nowadays?

But then I do a quick google search and see this as the top result

https://stackoverflow.com/questions/652292/what-is-unit-test...

> Unit testing simply verifies that individual units of code (mostly functions) work as expected.

Well no fucking wonder.

This is like the whole JRPG thing where the younger generation misunderstood but in their hubris claim it's the older generation that coined the term that doesn't understand.

It's the blind leading the blind and that's probably the most apt description of our industry right now.

[+] dmos62|2 years ago|reply

This resonates. I learned the hard way that you want your main tests to integrate all layers of your system: if the system is an HTTP API, the principal tests should be about using that API. All other tests are secondary and optional: can be used if they seem useful during implementation or maintenance, but should never be relied upon to test correctness. Sometimes you have to compromise, because testing the full stack is too expensive, but that's the only reason to compromise.

This is largely because if you try to test parts of your system separately, you have to perfectly simulate how they integrate with other parts, otherwise you'll get there worst case scenario: false test passes. That's too hard to do in practice.

I suspect that heavy formalization of the parts' interfaces would go a long way here, but I've not yet seen that done.

[+] emadb|2 years ago|reply

The big TDD misunderstanding is that most people consider TDD a testing practice. The article doesn’t talk about TDD, it gives the reader some tips on how to write tests. That’s not TDD.

[+] MoreQARespect|2 years ago|reply

I'm fully aware of the idea that TDD is a "design practice" but I find it to be completely wrongheaded.

The principle that tests that couple to low level code give you feedback about tightly coupled code is true but it does that because low level/unit tests couple too tightly to your code - I.e. because they too are bad code!

Have you ever refactored working code into working code and had a slew of tests fail anyway? That's the child of test driven design.

High level/integration TDD doesnt give "feedback" on your design it just tells you if your code matches the spec. This is actually more useful. It then lets you refactor bad code with a safety harness and give failures that actually mean failure and not "changed code".

I keep wishing for the idea of test driven design to die. Writing tests which break on working code is inordinately uneconomic way to detect design issues as compared to developing an eye for it and fixing it under a test harness with no opinion on your design.

So, yes this - high level test driven development - is TDD and moreover it's got a better cost/benefit trade off than test driven design.

[+] deneas|2 years ago|reply

I mean I think it's fair to assume that TEST-Driven-Development has something to do with testing. That being said, Kent Beck recently (https://tidyfirst.substack.com/p/tdd-outcomes) raised a point saying TDD doesn't have to be just an X technique, which I wholeheartedly agree with.

[+] shimst3r|2 years ago|reply

Instead of Test-Driven Design, it should’ve been called Design-By-Testing.

[+] marcosdumay|2 years ago|reply

Well, it's exactly as much about testing as it focus on writing and running tests.

What means, it's absolutely entirely about them.

People can claim it's about requirements all they want. The entire thing runs around the tests, and there's absolutely no consideration to the requirements except on the part where you map them into tests. If you try to create a requirements framework, you'll notice that there is much more to them than testing if they are met.

[+] danmaz74|2 years ago|reply

As I remember the discourse about TDD, originally it was described as a testing practice, and later people started proposing to change the last D from "development" to "design".

[+] skrebbel|2 years ago|reply

Yeah it’s kind of unfortunate because they make a very good argument about defining a thing better, and in the title use a wrong definition of an adjacent term.

[+] WolfOliver|2 years ago|reply

Maybe the term TDD in the title can be replaced with "unit testing". But unit testing is an major part of TDD.

[+] michalc|2 years ago|reply

> Now, you change a little thing in your code base, and the only thing the testing suite tells you is that you will be busy the rest of the day rewriting false positive test cases.

If there is anything that makes me cry, it’s hearing “it’s done, now I need to fix the tests”

[+] tetha|2 years ago|reply

It's something we've changed when we switched our configuration management. The old config management had very, very meticulous tests of everything. This resulted in great "code" coverage, but whenever you changed a default value, at least 6 tests would fail now. Now, we much rather go ahead and test much more coarsely. If the config management can take 3 VMs and setup a RabbitMQ cluster that clusters and accepts messages, how wrong can it be?

And this has also bled into my development and strengthened my support of bug-driven testing. For a lot of pretty simple business logic, do a few high level e2e tests for the important behaviors. And then when it breaks, add more tests for those parts.

But note, this may be different for very fiddly parts of the code base - complex algorithms, math-heavy and such. But that's when you'd rather start table based testing and such. At a past gamedev job, we had several issues with some complex cost balancing math, so I eventually setup a test that allows the game balancing team to supply CSV files with expected results. That cleared up these issues within 2 days or so.

[+] scaramanga|2 years ago|reply

If changing the implementation but not the behaviour breaks a test, I just delete the test.

[+] WolfOliver|2 years ago|reply

Agree, this is usually a sign the team writes tests for the sake of writing tests.

[+] int0x80|2 years ago|reply

Sometimes, you have to make a complex feature or fix. You can first make a prototype of the code or proof of concept that barely works. Then you can see the gap that remains to make the change production ready and the implications of your change. That involves fixing regressions in the test suite caused by your changes.

[+] tbrownaw|2 years ago|reply

> Tip #1: Write the tests from outside in.

> Tip #2: Do not isolate code when you test it.

> Tip #3: Never change your code without having a red test.

> Tip #4: TDD says the process of writing tests first will/should drive the design of your software. I never understood this. Maybe this works for other people but it does not work for me. It is software architecture 101 — Non-functional requirements (NFR) define your architecture. NFR usually do not play a role when writing unit tests.

The one time I ever did "proper" red/green cycle TDD, it worked because I was writing a client library for an existing wire protocol, and knew in adance exactly what it needed to do and how it needed to do it.

Item 2 is right, but this also means that #1 is wrong. And knowing what order #2 requires, means knowing how the code is designed (#4).

[+] sheepshear|2 years ago|reply

The tips are not contradictory if you follow the advice to start at a higher level.

Let's say you had to invent that wire protocol. You would write a test for a client that doesn't care which wire protocol is used.

[+] randomdata|2 years ago|reply

TDD was later given the name Behavior Driven Development (before being usurped by the likes of Cucumber, Gerkhin) in an attempt to avoid this confusion. TDD advocates that you test that the client library does what its public interface claims it does – its behavior, not how it is implemented under the hood. The wire protocol is almost irrelevant. The tests should hold true even when the wire protocol is replaced with another protocol.

[+] almostnormal|2 years ago|reply

Part of the problem is caused by all sides using the same terms but with a different meaning.

> You just don’t know if your system works as a whole, even though each line is tested.

... even though each line has been executed.

One test per line is strongly supported by tools calculating coverage and calling that "tested".

A test for one specific line is rarely possible. It may be missing some required behavior that hasn't been challanged by any test, or it may be inconsistent with other parts of the code.

A good start would be to stop calling something just executed "tested".

[+] osigurdson|2 years ago|reply

My view on unit testing is if there are no dependencies, there is no real reason not to write tests for all behaviours. While you may have a wonderful integration testing suite, it is still great to know that building blocks work as intended.

The problems arise with dependencies as now you need to decide to mock them or use concrete implementations. The concrete implementation might be hard to set up , slow to run in a test - or both. Using a mock, on the other hand, is essentially an alternate implementation. So now your code has the real implementation + one implementation per test (in the limit) which is plainly absurd.

My current thinking (after writing a lot of mocks) is to try to shape code so that more of it can be tested without hard to setup dependencies. When this can't be done, think hard about the right approach. Try to put yourself in the shoes of a future maintainer. For example, instead of creating a bespoke mock for just your particular test, consider creating a common test utility that mocks a commonly used dependency in accordance with common testing patterns. This is just one example. Annoyingly, a lot of creativity is required once dependencies of this nature are involved which is why it is great to shape code to avoid it where possible.

[+] philippta|2 years ago|reply

In my experience a lot of engineers are stuck thinking in MVC terms an fail to write modular code. As a result most business logic is part of a request / response flow. This makes it infeasible to even attempt to write tests first, thus leaving integration or e2e tests as the only remaining options.

[+] laurencerowe|2 years ago|reply

I’m not a TDD purist but I’ve found that so long as the request / response flow is a JSON api or similar (as opposed to old style forms and html rendering) then writing integration tests first is quite easy so long as you make sure your test fixtures are fairly fast.

[+] trwrto|2 years ago|reply

Where should the business logic rather be? My tests are typically calling APIs to test the business logic. Trying to improve myself here.

[+] gardenhedge|2 years ago|reply

> Never change your code without having a red test

I'll never understand why people insist on this. If you want to write your tests first, that is fine. Noone is going to stop you. But why must you insist everyone does it this way?

[+] gombosg|2 years ago|reply

I think that unit tests are super valuable because when used properly, they serve as micro-specifications for each component involved.

These would be super hard to backfill later, because usually only the developer who implements them knows everything about the units (services, methods, classes etc.) in question.

With a strongly typed language, a suite of fast unit tests can already be in feature parity with a much slower integration test, because even if mocked out, they essentially test the whole call chain.

They can offer even more, because unit tests are supposed to test edge cases, all error cases, wrong/malformed/null inputs etc. By using integration tests only, as the call chain increases on the inside, it would take an exponentially higher amount of integration tests to cover all cases. (E.g. if a call chain contains 3 services, with 3 outcomes each, theoretically it could take up to 27 integration test cases to cover them all.)

Also, ballooning unit test sizes or resorting to unit testing private methods give the developer feedback that the service is probably not "single responsibility" enough, providing incentive to split and refactor it. This leads to a more maintainable service architecture, that integration tests don't help with.

(Of course, let's not forget that this kind of unit testing is probably only reasonable on the backend. On the frontend, component tests from a functional/user perspective probably bring better results - hence the popularity of frameworks like Storybook and Testing Library. I consider these as integration rather than unit tests.)

[+] acidburnNSA|2 years ago|reply

Was 'unit' originally intended to be a test you could run in isolation? I don't think so. I'm not an expert in testing history, but this Dec 2000 Software Quality Assurance guide from the Nuclear Regulatory Commission defines Unit Testing as:

> Unit Testing - It is defined as testing of a unit of software such as a subroutine that can be compiled or assembled. The unit is relatively small; e.g., on the order of 100 lines. A separate driver is designed and implemented in order to test the unit in the range of its applicability.

NUREG-1737 https://www.nrc.gov/docs/ML0101/ML010170081.pdf

Going back, this 1993 nuclear guidance has simililar language:

> A unit of software is an element of the software design that can be compiled or assembled and is relatively small (e.g., 100 lines of high-order language code). Require that each software unit be separately tested.

NUREG/BR-0167 https://www.nrc.gov/docs/ML0127/ML012750471.pdf

[+] sebtron|2 years ago|reply

When I first leant about unit tests / TDD, I was confused because everyone assumes you are doing OOP. What am I supposed to do with my C code? I can just test a function, right? Or do I have to forcefully turn my program into some OO-syle architecture?

But then I realized it does not matter, there is only important thing about unit tests: that they exists. All the rest is implementation detail.

Mocking or not, isolated "unit" or full workflow, it does not matter. All I care about is that I can press a button (or type "make test" or whatever) and my tests run and I know if I broke something.

Sure, your tests need to be maintainable, you should not need to rewrite them when you make internal changes, and so on. You'll learn as you go. Just write them and make them easy to run.

[+] okl|2 years ago|reply

For C code you can use link-time substitution and a mock generator like CMock (http://www.throwtheswitch.org/cmock).

Link-time substitution means that you swap out certain objects with others when you build your test binaries.

For example, let's say your production software binary consists of a main function and objects A, B and C. For a unit test you could use a different main (the test), object B and a mock for object C - leaving out A.

[+] danielovichdk|2 years ago|reply

Read https://www.manning.com/books/unit-testing it's the best book on the subject and is presenting the matter with good evidence.

"Tip #4: TDD says the process of writing tests first will/should drive the design of your software. "

Yes and if that does not happen during TDD i would argue you are not doing TDD. Sure you always have some sort of boundaries but design up front is a poor choice when you try to iterate towards the best possible solution.

[+] JonChesterfield|2 years ago|reply

This article is internally inconsistent. It leads with considering "unit" to be "the whole system" being bad, and then tip #1 is to test from the outside in, at whole system granularity. On the other hand, it does point out that "design for test" is a nonsense, so that meets my priors.

By far the worst part of TDD was the proposed resolution to the tension with encapsulation. The parts one wants to unit test are the small, isolated parts, aka "the implementation", which are also the parts one generally wants an abstraction boundary over. Two schools of thought on that:

- one is to test through the API, which means a lot of tests trying to thread the needle to hit parts of the implementation. The tests will be robust to changes in the implementation, but the grey box coverage approach won't be, and you'll have a lot of tests

- two is to change the API to expose the internals, market that as "good testable design" and then test the new API, much of which is only used from test code in the immediate future. Talk about how one doesn't test the implementation and don't mention the moving of goal posts

Related to that is enthusiasm for putting test code somewhere separate to production code so it gets hit by the usual language isolation constraints that come from cross-module boundaries.

Both of those are insane nonsense. Don't mess up your API to make testing it easier, the API was literally the point of what you're building. Write the tests in the same module as the implementation and most of the API challenge evaporates. E.g. in C++, write the tests in an anonymous namespace in the source file. Have more tests that go through the interface from outside if you like, but don't only have those, as you need way more to establish whether the implementation is still working. Much like having end to end tests helps but having only end to end tests is not helpful.

I like test driven development. It's pretty hard to persuade colleagues to do it so multi-developer stuff is all end to end tested. Everything I write for myself has unit tests that look a lot like the cases I checked in the repl while thinking about the problem. It's an automated recheck-prior-reasoning system, wouldn't want to be without that.

[+] WolfOliver|2 years ago|reply

> It leads with considering "unit" to be "the whole system" being bad

I do not understand this statement. Could you point out which part of the article you mean.

[+] dave333|2 years ago|reply

For some classic wisdom about writing tests see the classic "Art of Software Testing" by Glenford Myers. It's $149+++ on Amazon, but only $5 on ebay:

https://www.ebay.com/sch/i.html?_from=R40&_trksid=p3671980.m...

This was originally published before TDD was a thing, but is highly applicable.

[+] seanwilson|2 years ago|reply

Which companies or large projects use TDD at the moment? There's always such intense discussion about what it is and its benefits, yet I don't see anyone actually doing TDD.

[+] JackMorgan|2 years ago|reply

I've been in several multi-million line codebases that all were built with TDD. It's possible.

The default way of organizing code with DI makes unit tests extremely expensive to write and maintain. Mocks should be banned if you want to add unit tests or practice TDD. Instead the tested code should be pure. Pure code is easy to test, even if it's calling a dozen helper functions.

[+] hmeh|2 years ago|reply

Dozen-ish person team, 3 year project so far, billion dollar revenue company (not a software company), >500k LOC, TDD since the beginning. Have been doing TDD for 18 years or so. Still getting better at it.

[+] projektfu|2 years ago|reply

I think it's both attention-getting and distracting to start with a definition of unit testing that hardly anybody uses. Now I'm not interested in the article because I have to see what your sources are and whether you're gaslighting me.

The reason people use the term unit test to mean the size of the system under test is because that's what it's generally meant. Before OO, it would mean module. Now it means class. The original approach would be to have smaller, testable functions that made up the functionality of the module and test them individually. Decoupling was done so that you didn't need to mock the database or the filesystem, just the logic that you're writing.

Some people disagree with unit testing and focus on functional testing. For example, the programming style developed by Harlan Mills at IBM was to specify the units very carefully using formal methods and write to the specification. Then, black-box testing was used to gain confidence in the system as a whole.

I feel that a refactor shouldn't break unit tests, at least not if the tools are smart enough. If you rename a method or class, its uses should have been renamed in the unit tests. If you push a method down or up in a hierarchy, a failing test tells you that the test is assuming the wrong class. But most cases of failing tests should be places where you made a mistake.

However, I agree that functional tests are the hurdle you should have crossed before shipping code. Use unit testing to get 100ms results as you work, functional tests to verify that everything is working correctly. Write them so that you could confidently push to production whenever they're green.

[+] mannykannot|2 years ago|reply

The article highlights this claim:

"Now, you change a little thing in your code base, and the only thing the testing suite tells you is that you will be busy the rest of the day rewriting false positive test cases."

Whenever this is the case, it would seem at least one of the following is true:

1) There are many ways the 'little change' could break the system.

2) Many of the existing tests are testing for accidental properties which are not relevant to the correct functioning of the system.

If only the second proposition describes the situation, then, in my experience, it is usually a consequence of tests written to help get the implementation correct being retained in the test suite. That is not necessarily a bad thing: with slight modification, they might save time in writing tests that are useful in getting the new implementation correct.

I should make it clear that I don't think this observation invalidates any of the points the author is making; in fact, I think it supports them.

[+] andsmedeiros|2 years ago|reply

TDD can be valuable but sometimes hindering. I find myself often with an incomplete idea of what I want and, thus, no clear API to start testing. Writing a quick prototype -- sometimes on godbolt or replit -- and then writing tests and production code will actually yield me a better productivity.

I usually test all of the public API of something and only it. Exported functions, classes, constants and whatever should be tested and properly documented. If writing tests for the public surface is not enough, most likely the underlying code is poorly written, probably lacking proper abstractions to expose the adequate state associated with a determined behaviour (e.g.: a class that does too much).

[+] skohan|2 years ago|reply

I think it also heavily depends on the language you are working with. For instance, unit tests are much more important in a duck-typed language than a strongly typed language, since the compiler is less capable of catching a number of issues.

184 comments