top | item 4136752

Experiment: Unit testing isn't enough; You need static types, too

189 points| fpgeek | 13 years ago |evanfarrer.blogspot.ca | reply

267 comments

order
[+] davesims|13 years ago|reply
The sample size of this study is statistically too insignificant to draw the conclusions given. The counterfactuals to the claims of dynamic type advocates were already true and provable, so even if the sample size of the codebases studied were statistically significant (not to mention vetted for quality of unit test as well as coverage) the conclusions are nevertheless trivial.

In addition, the hidden assumption is that all static and dynamic typing are created equal, i.e., since Haskell is statically typed and Haskell appears to have caught Python bugs that unit tests did not, therefore Java will catch bugs in a Ruby codebase, C++ will catch bugs in a JavaScript codebase, etc. Of course this assumption is gratuitous. Haskell in particular has a specific sort of type checking that is far different from Java's or C++'s, for instance.

Further, not all dynamic systems are created equal. Ruby, for instance, I think can be shown to require fewer lines of code to achieve similar functionality to, for instance, Java. Fewer lines of code, should in principle mean fewer opportunities for defects. Dynamic languages with metaprogramming features like Ruby's or Smalltalk's should in principle be able to eliminate more code duplication than an environment like C++. This aspect of dynamic languages should be taken into account, again with a statistically significant sample size, and weighed against bugs caught by static typing.

The study is interesting as a preliminary investigation, but the conclusions should have been much more modest, proportionate to both the sample size, in terms of % of production codebases and the extremely important idiosyncratic nature of Haskell vs. other static typed environments. Something like: "The study has shown Haskell's type system will catch some bugs not caught in an otherwise well-covered Python codebase. These bugs could in theory have been caught by unit tests, therefore it is recommended that when using a dynamic language, more care must be taken to cover these types of bugs."

That would have been a more appropriate and modest conclusion, consistent with the data, than the sweeping generalization "You need Static Typing."

[+] efarrer|13 years ago|reply
I (the author) appreciate the feedback. I believe that many of your criticisms are addressed in the actual paper. First of all I completely agree that my sample size is too small for a conclusive proof. I mention in the paper that I hope that others will try and replicate this experiment on other pieces of software. I do think it's appropriate when conducting an experiment to publish a conclusion, not that the experiment will constitute proof (or an established scientific theory), but as a conclusion to the study that others can try to confirm or refute.

I also mention in the paper that it would be beneficial to conduct this experiment using different type systems for the reasons that you stated above.

The argument against static typing that I was testing didn't mention any particular type system nor any particular dynamically typed language, it was a general argument that stated that unit testing obviated static typing. Because the argument was so general and absolute I felt that any static type system that could be shown to expose bugs that were not caught by unit testing would be enough to refute the argument. I was not trying to prove that any type system would catch bugs not found by any unit tested software. The paper also points out that I'm trying to see whether unit testing obviates static typing in practice, in theory you could implement a poor mans type checker as unit tests, but my experiment was focused on whether in practice unit testing obviates static typing.

Finally I believe that my conclusion in the paper was at least a bit more modest than that of the blog post. The lack of apparent modesty in the blog post was caused more by a lack of ability on my part to accurately summarize than an inflated sense of accomplishment and self importance.

[+] ScottBurson|13 years ago|reply
I see both sides of this argument. The OP -- at least in this blog post; I haven't read the paper -- spends most of his time talking about how he's demonstrated the insufficiency of unit testing. For the purpose of that argument, it really doesn't matter that he used Haskell as opposed to some other type checker.

It's only in the last two sentences of his "Conclusion" section that he turns the argument around, and here is where he oversteps:

While unit testing does catch many errors it is difficult to construct unit tests that will detect the kinds of defects that would be programatically detected by static typing. The application of static type checking to many programs written in dynamically typed programming languages would catch many defects that were not detected with unit testing[...]

Clearly, this is overbroad. For starters, he should have used "could" in place of "would". And it wouldn't have been a bad time to remind the reader that Haskell's type system differs from those of other statically typed languages with which the reader may be more familiar.

I don't quite agree, though, that the conclusion is "trivial". Maybe I'm just out of touch, but I wasn't aware of a good test of how true the dynamic argument was in practice, as opposed to theory -- particularly claim #2.

[+] davesims|13 years ago|reply
I think I should clarify what I meant by "the conclusions are nevertheless trivial." Let's look at the key statement in the conclusion of the study:

"Based on these results, the conclusion can be reached that while unit testing can detect some type errors, in practice it is an inadequate replacement for static type checking."

As I've already pointed out, this seems to me an ambitious and over-reaching conclusion, given the scope of the study.

But, equally important, it is simply an example of something that was already provable. It should be axiomatic that in principle automatically-generated validation like that provided by static typing should in theory be able to catch type errors not caught manually in a dynamic context, either for reasons of human oversight or human error.

In other words, it seems to me that all that has been done here, is to provide a few concrete examples of what was already true and uncontroversial: auto-generated coverage of specific types of validations can be more comprehensive than some human beings will be in some environments and contexts. It has not shown that the perceived benefits of dynamic typing with good unit tests are outweighed by this fact, nor that, statistically-speaking, errors of this type are common enough to warrant a preference of static typing over dynamic typing with unit tests in all contexts.

[+] papsosouid|13 years ago|reply
>In addition, the hidden assumption is that all static and dynamic typing are created equal, i.e., since Haskell is statically typed and Haskell appears to have caught Python bugs that unit tests did not, therefore Java will catch bugs in a Ruby codebase, C++ will catch bugs in a JavaScript codebase, etc.

That assumption isn't hidden, it is made up. By you. The question was "can static typing catch bugs that made it past a decent (and common) test suite". The answer to that can drive interest in static typing, and thus more language with useful static type systems. Just because java has a crappy type system, doesn't mean we should be content with that.

[+] wissler|13 years ago|reply
You make a good point -- I don't think any statistical study will ever be able to show that static typing is better.

I do think that a rational argument can show it, but my argument is too long to fit into the margin.

[+] cageface|13 years ago|reply
I'm convinced that dynamically typed languages are a transitional technology that will be superseded once we develop type systems that are both usefully strict but also flexible.

After over ten years working in dynamic languages I'm very happy to have a compiler on my side again.

[+] ocharles|13 years ago|reply
Amen, I've spent the last 3 years of my career working on an increasingly complicated Perl project and frankly, I've had enough. My pet projects now are all Haskell and I can't believe how fun it is.

I spend most of my time these days fixing bugs and regressions due to the sheer scale of the project, and I'm a running meme at work for saying "a type checker could have caught that!". I can't imagine going back to a dynamic language now.

[+] dochtman|13 years ago|reply
Seems to me that that's what static typing proponents have been saying for years (if not decades), and it still isn't true.

That said, I'm currently hacking on a compiler to see if I can come up with a design for a static language that's almost as pleasant to use as a dynamic one, so I'm not completely hopeless. But it certainly seems like the data point that until now no such superior type system (strict but flexible) has become widely popular should not be underestimated.

[+] gnuvince|13 years ago|reply
What dynamic language were you using and which language are you using now?
[+] akkartik|13 years ago|reply
I haven't heard of anybody suggest that static languages should allow lists to contain arbitrary types. Hmm, do tuples cover all the use cases for that feature?
[+] zemo|13 years ago|reply
>type systems that are both usefully strict but also flexible.

that's basically the design criteria behind Go's type system.

[+] glenjamin|13 years ago|reply
I applaud the effort to try and dissect the problem scientifically.

If a whole program has 1 bug due to being implemented with dynamic types over static types, and that bug has gone unnoticed, then it can't be particularly important.

The other common proposition is that dynamically typed languages are faster to write in than statically typed languages. If this is true then we need to compare the saving in development time with the cost of the bugs which go undetected.

My gut feeling says that this line of analysis is never going to prove that static typing in inherently "better".

[+] cageface|13 years ago|reply
What? He specifically says that many of the bugs he discovered were exploitable. Just because you missed them doesn't mean some hacker won't.
[+] Deestan|13 years ago|reply
> The other common proposition is that dynamically typed languages are faster to write in than statically typed languages.

This part is also very tricky, as most of people's hard-earned experience with this is (almost by definition) old. In recent years, type inference has reduced the type-caused slowdown immensely. E.g. in Haskell you can (+) write large statically typed programs without specifying any types at all - they are inferred by the compiler.

(+) But please don't.

[+] johnkchow|13 years ago|reply
From my personal experience, the saving in development time is very much real.

I used to work on .NET, and you'd have to program against crazy, non-intuitive patterns in order to have your code "clean" (I hated IoC containers as well as writing all that boilerplate code for Attributes. And for Java, remember that hilarious post about Factory-Factory-Factory patterns http://discuss.joelonsoftware.com/default.asp?joel.3.219431?)

I'm not saying statically typed languages are bad. Type errors always bite me the ass in Ruby, but it's a small price to pay (IMO) for better maintainability.

EDIT: By the way, it sounds like I'm an ignorant dynamically-typed lover. I'm not, I still yearn for those type safety net, but I'm just speaking from a pragmatic perspective.

[+] tikhonj|13 years ago|reply
An interesting thought: a good type system can actually make a language more expressive.

A perfect example is QuickCheck. QuickCheck allows you to write complicated tests very simply by relying on the type system. You just write out the invariant and the type system automatically figures out which random generators to use to run the tests.

QuickCheck has been ported to a bunch of other languages, but it's more complex and seems harder to use in dynamically typed languages as compared to Haskell.

A simpler example in the same vein is Haskell's read function. Essentially, read is the opposite of toString--it goes from a string to some value. The beauty is that you never need to specify what type you're parsing; it can figure out what type it needs to be thanks to the type system. So instead of having a bunch of functions like parseDouble and parseInt, you have a single read function. This also makes the library prettier by maintaining the symmetry between show and read (toString and fromString).

[+] AngryParsley|13 years ago|reply
I think most people would agree that these two circles overlap on a Venn diagram:

    (    Bugs found by unit tests (     ) Bugs found by type-checking    )
The disagreement is how much. Also, type-checking is free[1], while unit tests have to be manually written.

I'm glad someone spent a lot of time trying to answer this question, but I don't think it will affect my choice of language in any new project. I like to write code in the languages I like, and bugs be damned.

1. A common argument is that static-typed languages slow development. I'm not touching that land-mine.

[+] afrozenator|13 years ago|reply
The author really needs to be complemented on rewriting swathes of code from Python to Haskell.

In Google, in my project, we've had runtime errors in Python code, due to wrongly spelled variables(although that is a different problem), and type error, something a compiler would have caught.

Strong type checking is something that I truly like about Haskell and OCaml, I'm reasonably convinced that once my program has passed the typechecker, it is logically correct. Though debugging in Haskell is truly a different ballgame altogether (I'm a Haskell noob).

I'll stop here lest this turns into a flame war.

[+] 16s|13 years ago|reply
pylint can help with misspelled variables and type errors. I started using it recently and love it. I still love my C++ compiler though and would not trade it for anything else.
[+] pyre|13 years ago|reply

  > we've had runtime errors in Python code, due to [...]
  > type error, something a compiler would have caught
Are wrongly spelled variable names and type errors the only runtime errors that you get? If not what percentage are they?

Personally, I think that people tend to obsess about the specific type of errors because the 'solution' (static-typing) is something that already exists, whereas there is not easy solution to other types of flaws.

[+] kingkilr|13 years ago|reply
If you can convert a program from one language to another (which is non-trivially different), in the time it takes to complete your masters, I'm pretty sure it wasn't a very interesting program. Further, the quality of the developers is going to play a large role in how effective any tool (and make no mistake, static typing is a tool) is. This is not intended as a disparaging remark to the authors, but in the 30 seconds I spent reviewing each of these code bases, I was totally unimpressed: none of them seemed to follow PEP8, and several of their test files weren't even unit tests, they were just a random scrip that appeared to exercise a tiny part of the codebase. I therefore conclude that the methodology used in operating this experiment was flawed and, consequently the conclusion cannot be taken as scientifically valid.
[+] dmethvin|13 years ago|reply
The lack of static types really comes into play not inside a library, but in the interface between the external code and a library. Note that at least a few of the bugs that were found involved invalid API inputs. From the library writer's view it's not a bug because those values are not in the domain of defined behavior. From the caller's view it's a PITA that the library doesn't yell at them when they pass it garbage. Of course, they could find the problem with unit tests but the further up the food chain you get the more scarce unit tests tend to be.
[+] swannodette|13 years ago|reply
Static types or static analysis?

  * KLEE: Unassisted and Automatic Generation of 
    High-Coverage Tests for Complex Systems Programs  
    http://llvm.org/pubs/2008-12-OSDI-KLEE.html
  * Erlang Dialyzer 
    http://www.erlang.org/doc/man/dialyzer.html
  * Datalog based systems 
    http://www.cse.msu.edu/~cse914/Overheads/mmcgill-java-static-race-detector.pdf
If you want static analysis hard coded into your language - what feature set do you want to support? The following support different styles of programming thus have very different static type systems.

  * Standard ML / OCaml
  * Haskell
  * Scala
  * Typed Racket
  * Qi
And there's still the question that some kinds of extremely useful programs are very difficult to write in popular languages with strong static typing. miniKanren, a flexible embedding of Prolog and Constraint Logic Programming into Lisp, comes to mind here. I've seen versions of miniKanren written in Haskell and it abandons the most powerful feature of miniKanren - it can be trivially applied back on the language it is written in!
[+] IsTom|13 years ago|reply
Dialyzer is not very good. We've been using it, but it rarely catches anything non-trivial. On the other hand Haskell has a better support for unit tests than many of dynamic languages.
[+] timclark|13 years ago|reply
Interesting, but worth remembering, as Rich Hickey says, every bug has got past both your unit tests and your type checking.
[+] drharris|13 years ago|reply
It is definitely refreshing to see some actual evidence in something where arguments tend to be based on speculation, experience, or opinion. Now, we just need someone to research Emacs v. Vim, Tabs v. Spaces, etc.
[+] praptak|13 years ago|reply
> Tabs v. Spaces

Not exactly this, but there exists "Program indentation and comprehensibility" http://www.cs.umd.edu/~ben/papers/Miara1983Program.pdf which tries to tackle the "2 spaces vs 4 spaces vs 8 spaces indent". Not very good tho, since they seem to add superfluous indent levels like a separately indented "then" after "if" (Pascal), which effectively doubles the actual indent of the semantic block.

[+] crazygringo|13 years ago|reply
Don't forget about braces versus indents, and above all, semicolons or not! :)

On second thought, let's forget about them after all...

[+] timruffles|13 years ago|reply
I wish more of these open questions in coding (and everything) had people making such a good effort to research them! There never is a simple answer, or they'd not be an open question.

Really inspiring, painstaking piece of work.

[+] pimentel|13 years ago|reply
How can you translate from python to a static language, when the code is written for the interfaces, and not types? How willl you translate a function receiving a (possibly custom) iterable, when the function doesn't care about the type, but just whether it implements a next() method?
[+] pja|13 years ago|reply
Hindley-Milner type systems to the rescue! (With a sprinkling of Haskell style type classes.)

  f :: Iterable a => a -> b
  f x = (do stuff with x...)
f is defined to be constrained to accept only on types which implement the Iterable interface (however that's defined) as it's first argument and the compiler (or interpreter) will enforce that constraint.
[+] sbmassey|13 years ago|reply
Haskell type classes are sufficiently flexible to represent things like 'does this type have a next method' without having to instrument the actual type.
[+] troels|13 years ago|reply
There's a difference between the implementation type and the protocol type. Or as Java calls them - Classes and Interfaces. Unfortunately most classical OO language collate the two, causing much confusion.
[+] Symmetry|13 years ago|reply
Haskell's typeclasses are very close to this, but even closer is Go. You just define and interface and the functions that require it, and the compiler goes off and check's by itself whether the type you're passing in satifies the interface - no need for you to annotate it yourself. You lose Haskell's nifty "deriving" feature, but it's even closer to Python's duck typing with compile time checks.
[+] dons|13 years ago|reply
> the function doesn't care about the type, but just whether it implements a next() method?

You're describing an "existential" type.

[+] JustinJ70s|13 years ago|reply
Scala's type system can handle this as it support type safe duck typing.
[+] eta_carinae|13 years ago|reply
The advantage of statically typed languages has a lot less to do with tests and everything to do with tools. IDE's can do very little when no type information is available, and most automatic refactorings require human supervision when performed on dynamically typed languages (read this for details: http://beust.com/weblog/2006/10/01/dynamic-language-refactor... ).
[+] thebluesky|13 years ago|reply
Tooling and performance are two largest advantages. Disadvantage is more verbosity, but it's a trade-off.

With dynamically typed languages I find that you still need to worry about types, but you have to trace through the code to figure out what type a particular variable is (esp. if you're not the only one working on a project). In a statically typed language that information is readily available.

[+] dspeyer|13 years ago|reply
For the cases I was easily able to count, this is one bug per thousand lines of code. I realize that's a handwavy metric, but it's enough to say we're not talking about a ton of bugs.

On the other hand, successful conversion of these codebases is a very interesting result. Apparently static, at least static as sophisticated as Haskell, actually can express most of the idioms in standard python. This surprises me, as I've never seen a C++ program of significant complexity that didn't resort to void*s somewhere.

[+] njharman|13 years ago|reply
> "frequently cited claim by proponents of dynamically typed programming languages that static typing was not needed for detecting bugs in programs"

Who says that? There's trade offs, multitudes, in choosing paradigm / language. It's never so black and white (except for academics, and twits who like to argue more than code)

I did not read paper, don't have time for 60 pages of pointlessness.

[+] zopa|13 years ago|reply
> "Who says that?"

http://news.ycombinator.com/item?id=4137283 comes pretty close to making that claim, and that's just on this page.

> "There's trade offs, multitudes, in choosing paradigm / language. It's never so black and white"

Sometimes, it is. Most people don't use COBOL anymore, with reason; better languages came along. Programming is still a new field in the scheme of things. It would be strange if our languages were perfectly optimized, with no room for improvement without offsetting costs.

> "twits who like to argue more than code"

Some of us like to do both :) But go program; we won't stop you...

[+] efarrer|13 years ago|reply
I agree there are generally trade offs, that's essentially the results of my "60 pages of pointlessness" paper. If you did have time to read the paper you would notice a reference to this book http://my.safaribooksonline.com/book/software-engineering-an... that is an agument for unit testing instead of static typing. I think we need to use the scientific method in computer science and not just base our ideas of intuition, belief or absolutes like "It's never so black and white".
[+] jknupp|13 years ago|reply
The author's interpretation of the argument in favor of dynamic languages seems purposefully naive. I don't think that any proponent of dynamic languages or unit testing claimed that the mere presence of unit tests guaranteed bug free code or that it was impossible to have type related errors at run time if you have unit tests. It's a more nuanced argument asserting that the benefit to programmer productivity when using dynamic languages outweighs the cost of potential type related errors not possible in a statically typed language. Whether it has merit or not is the question I hoped this paper would answer.
[+] gouranga|13 years ago|reply
Slightly different view here.

I think the nirvana of typing is hybrid static/dynamic and it hit us with visual basic 6. I don't think anyone really noticed it though.

It supports traditional type checking by the compiler, runtime type inference and dynamic typing without boxing. Each case can be chosen at will. It supports all theoretical programs, supports unit testing and runtime assertions.

The least buggy software I've seen over the years was written in vb6 (by professionals, not the crap that haunts the web).

Now I'm not saying we should all switch to vb6 but some of the ideas may be worth investigating.