While agreeing with the results of this article, I’ve found that convincing other developers of writing test with properties isn’t very easy: coming up with good properties is not always trivial.
Here is an informal testing maturity ladder in increasing order:
- code can only be tested with an integration test
- code is tested by comparing the stdout with an earlier version (hopefully it’s deterministic!)
- code is retrofitted with tests in mind
- code is written to be testable first, maybe with lots of mocks
- code is testable, and pure functions are tested with unit tests, leaving the rest to integration tests. Fewer mocks, some stubs.
- property based tests, assuming the unit tests are fast in the first place
- fuzzing
- mutation based testing
Not speaking of formal specs or performance testing or anything else.
The problem with your hierarchy is that there's no empirical evidence supporting it. Small unit tests have not empirically been shown to have benefits over integration tests, and test-driven design has failed to show a benefit over tests written after the fact. The only thing that seems to matter is that tests are written, and the more tests, the more chances of finding a bug. That's it. So your list is actually:
* integration and unit tests: since these are manually written, they scale poorly but are simple.
* property tests: since these are semi-automatic they scale better but are a bit more complicated to setup.
* fuzzing: almost fully automatic, although I don't differentiate this much from property-based testing.
> code is written to be testable first, maybe with lots of mocks
If you mean what I think I mean, this is the bottom rung of the ladder. Code that is only testable with lots of mocks is in practice worse than code with no tests.
Tests should do two things: catch undiscovered bugs and enable refactoring. Test mocked to high heaven do one thing: confirm that the code is written the way it’s currently written. This is diametrically opposed to and completely incompatible with those two stated goals. Most importantly, the code can’t be changed without breaking and rewriting tests.
Mocks are okay for modeling owners of external state. Even better are dummy/fake implementations that look and behave like the real thing, (but with highly simplified logic).
I really like this list, and it's a great idea to explain testing this way.
Perhaps there is also a level -1, when the tests actually make things worse. I see this when tests are extremely brittle, flaky, don't test the most complex or valuable bits of code, are very slow to run, or unmaintained with a list of "tests we know fail but haven't fixed".
>- code can only be tested with an integration test
Some code only makes sense to test with integration tests. It's not more effective with a code base where somebody has decided to fatten up the SLOC with dependency inversion just so that they can write some unit tests which test that x = x.
>- code is tested by comparing the stdout with an earlier version (hopefully it’s deterministic!)
Making the code deterministic or adapting the tests to accommodate that it isn't should be next on the ladder, not hoping that it is.
> coming up with good properties is not always trivial
This is difficult, but one technique that (might) make it easier for real-world applications beyond simple invariants is to take the approach of building a simple model of the system under test and, in the PBT, checking that your system's behavior matches the model's [1].
I don't understand why PBT is above mutation testing. It seems like it's more of a popularity contest kind of thing, and not a matter of engineering tradeoffs or how useful it is.
> While agreeing with the results of this article, I’ve found that convincing other developers of writing test with properties isn’t very easy: coming up with good properties is not always trivial.
Yes, but there's some easy targets:
Your example based tests often have some values that are supposed not to matter. You can replace those with 'arbitrary' values from your property based testing library.
Another easy test that's surprisingly powerful: just chuck 'arbitrary' input at your functions, and check that they don't crash. (Or at least, only throw the expected errors.) You can refine what 'arbitrary' means.
The implied property you are testing is that the system doesn't crash. With a lot of asserts etc in your code, that's surprisingly powerful.
If you're interested in property-based testing, I highly recommend "How to Specify It!" by John Hughes, which is available both as a talk and as a paper.
Can't read the pdf right now but I'm a big fan of property based testing.
One thing I find that people struggle with is coming up with "good properties" to test with.
That's the wrong way to think about it. The properties you want to test are the function's contract. Checking that contract is the goal.
You can be as specific with the contract as you want. The more specific the more bugs you'll find.
Property-based tests are just one way to check the contract, as are hand-written unit tests. You could use a static analyzer or a model checker as well, they're all different approaches to do the same thing.
EDIT: by contract I mean the guarantees the function imposes on its output. A contract for a sorting function could be as simple as the length of the output being the same as the input. That's one property. Another is that every element in the output is also in the input. You can go all the way and say that for every element at index i, the element at index i+1 (if any) is larger.
But you don't need a perfect contract to start with nor to end with. You can add more guarantees/properties as you wish. The more specific, the better (but also slower) the tests.
I found that writing new property based tests was a skill I picked up relatively quickly. But learning how to retro-fit existing tests was a whole 'nother skill that I had to learn almost independently afterwards.
Over time of trying to adopt or even reinvent PBT I discovered that there isn't a good language to formally describe software at the level where it becomes interesting to test it. TLA+ gets somewhat close... but it's both too difficult to write and its hard to adapt to eg. interactive systems.
I don't mean to say that PBT is a bad idea. I actually think it's very good. I wish there was a way to make it really useful though. As the paper mentions, PBT in their experience is used for "component testing", which is just another name for "unit testing", which is where automation of this kind isn't all that important. Integration and E2E testing is what's a lot more important, but doesn't really have a way to approaching r.n.
[+] [-] jiehong|2 years ago|reply
Here is an informal testing maturity ladder in increasing order:
- code can only be tested with an integration test
- code is tested by comparing the stdout with an earlier version (hopefully it’s deterministic!)
- code is retrofitted with tests in mind
- code is written to be testable first, maybe with lots of mocks
- code is testable, and pure functions are tested with unit tests, leaving the rest to integration tests. Fewer mocks, some stubs.
- property based tests, assuming the unit tests are fast in the first place
- fuzzing
- mutation based testing
Not speaking of formal specs or performance testing or anything else.
[+] [-] naasking|2 years ago|reply
* integration and unit tests: since these are manually written, they scale poorly but are simple.
* property tests: since these are semi-automatic they scale better but are a bit more complicated to setup.
* fuzzing: almost fully automatic, although I don't differentiate this much from property-based testing.
* mutation based testing
[+] [-] stouset|2 years ago|reply
If you mean what I think I mean, this is the bottom rung of the ladder. Code that is only testable with lots of mocks is in practice worse than code with no tests.
Tests should do two things: catch undiscovered bugs and enable refactoring. Test mocked to high heaven do one thing: confirm that the code is written the way it’s currently written. This is diametrically opposed to and completely incompatible with those two stated goals. Most importantly, the code can’t be changed without breaking and rewriting tests.
Mocks are okay for modeling owners of external state. Even better are dummy/fake implementations that look and behave like the real thing, (but with highly simplified logic).
[+] [-] blowski|2 years ago|reply
Perhaps there is also a level -1, when the tests actually make things worse. I see this when tests are extremely brittle, flaky, don't test the most complex or valuable bits of code, are very slow to run, or unmaintained with a list of "tests we know fail but haven't fixed".
[+] [-] pydry|2 years ago|reply
Some code only makes sense to test with integration tests. It's not more effective with a code base where somebody has decided to fatten up the SLOC with dependency inversion just so that they can write some unit tests which test that x = x.
>- code is tested by comparing the stdout with an earlier version (hopefully it’s deterministic!)
Making the code deterministic or adapting the tests to accommodate that it isn't should be next on the ladder, not hoping that it is.
[+] [-] jcgrillo|2 years ago|reply
This is difficult, but one technique that (might) make it easier for real-world applications beyond simple invariants is to take the approach of building a simple model of the system under test and, in the PBT, checking that your system's behavior matches the model's [1].
[1] https://dl.acm.org/doi/10.1145/3477132.3483540
[+] [-] boxed|2 years ago|reply
[+] [-] eru|2 years ago|reply
Yes, but there's some easy targets:
Your example based tests often have some values that are supposed not to matter. You can replace those with 'arbitrary' values from your property based testing library.
Another easy test that's surprisingly powerful: just chuck 'arbitrary' input at your functions, and check that they don't crash. (Or at least, only throw the expected errors.) You can refine what 'arbitrary' means.
The implied property you are testing is that the system doesn't crash. With a lot of asserts etc in your code, that's surprisingly powerful.
[+] [-] IshKebab|2 years ago|reply
[+] [-] pfdietz|2 years ago|reply
[+] [-] troupo|2 years ago|reply
Integration tests must be much higher on the list.
Also: nothing is stopping you from running your prop tests in integration tests, too
[+] [-] mdiep|2 years ago|reply
https://research.chalmers.se/publication/517894/file/517894_... https://www.youtube.com/watch?v=G0NUOst-53U
He gives fantastic guidance about how to write PBTs.
[+] [-] sirwhinesalot|2 years ago|reply
One thing I find that people struggle with is coming up with "good properties" to test with.
That's the wrong way to think about it. The properties you want to test are the function's contract. Checking that contract is the goal.
You can be as specific with the contract as you want. The more specific the more bugs you'll find.
Property-based tests are just one way to check the contract, as are hand-written unit tests. You could use a static analyzer or a model checker as well, they're all different approaches to do the same thing.
EDIT: by contract I mean the guarantees the function imposes on its output. A contract for a sorting function could be as simple as the length of the output being the same as the input. That's one property. Another is that every element in the output is also in the input. You can go all the way and say that for every element at index i, the element at index i+1 (if any) is larger.
But you don't need a perfect contract to start with nor to end with. You can add more guarantees/properties as you wish. The more specific, the better (but also slower) the tests.
[+] [-] eru|2 years ago|reply
[+] [-] crabbone|2 years ago|reply
I don't mean to say that PBT is a bad idea. I actually think it's very good. I wish there was a way to make it really useful though. As the paper mentions, PBT in their experience is used for "component testing", which is just another name for "unit testing", which is where automation of this kind isn't all that important. Integration and E2E testing is what's a lot more important, but doesn't really have a way to approaching r.n.
[+] [-] hwayne|2 years ago|reply
[+] [-] dikei|2 years ago|reply
[+] [-] Tomte|2 years ago|reply
[+] [-] eru|2 years ago|reply
[+] [-] throwup238|2 years ago|reply
[+] [-] IshKebab|2 years ago|reply