Leveraging the type system to avoid mistakes

[+] amzans|7 years ago|reply

Something that I enjoy very much in Scala is sealed traits + case classes.

  sealed trait Food
  case class FishTaco(size: FishSize) extends Food
  case class Burger(numPatties: Int) extends Food
  case class Spaghetti(sauce: PastaSauce) extends Food

It’s like Enums + Data Classes on steroids. You can pass around and transform your data as usual, but if you forgot to handle a specific case when doing pattern matching you’ll get a compile warning. For example in the following case:

  input match {
    case Burger(patties) => ...
    case Spaghetti(sauce) => ...
  }

The compiler will happily yell at you something like:

  warning: match may not be exhaustive.
  It would fail on the following input: FishTaco(_)

Little things such as this one help reasoning about your program since you can rely on the types describing what is possible and what is not.

[+] Rapzid|7 years ago|reply

If you like that kinda thing you should check out F# as well. The IDE experience/speed is much better for F# IMHO, but I have a lot of respect for the concepts behind both. And of course they ride on completely separate massive ecosystems, so you can't exactly swap them out willy-nilly..

I wish I could figure out why my IDE experience with Scala and IntelliJ(IDEA) was so darn sluggish. Also hoping they jump on the language server protocol bandwagon in the future.

[+] MrBuddyCasino|7 years ago|reply

That looks similar to Rust's enums. The compiler enforces exhaustiveness in pattern matching.

  enum WebEvent {
    // An `enum` may either be `unit-like`,
    PageLoad,
    PageUnload,
    // like tuple structs,
    KeyPress(char),
    Paste(String),
    // or like structures.
    Click { x: i64, y: i64 },
  }

[+] justinpombrio|7 years ago|reply

> It’s like Enums + Data Classes on steroids.

Also known as algebraic data types, which have been around for decades and can be found in Ml/OCaml, Haskell, and Rust.

[+] unknown|7 years ago|reply

[deleted]

[+] dnomad|7 years ago|reply

Types can definitely help with data well-formedness but personally I'm very suspicious of 'mini-types' like Name and CartID. If the data you're wrapping has no internal structure (that can be formally verified) and you're just using the type system as an argument checker then it's not clear to me that you're really adding much value. If CartID and ProductID are both just strings what's to stop a user from doing the same as:

  CartID cartId = new CartID(productId.toString());
  addToCart(..., cartId, ...); <-- but look that's really a ProductID!

I'd suggest that while you may have reduced the likelihood of certain errors you have not actually eliminated those errors.

The right approach here is to be very suspicious of any methods in your domain that just take a bunch of IDs. Either:

(a) Force the caller to resolve the IDs to actual entities before invoking the method:

  addToCart(Cart c, Product p, ...)

or (b) Represent the method parameters as a Parameters object or, preferrably, an Event object.

  handle(ProductAddedEvent event);

The real problem here is not the type system but a domain that is not sufficiently protected by an Anti-Corruption Layer [1]. Unfortunately most developers are not familiar with this kind of strategic design and languages are not much help here. It would be interesting if there was a language that could flag things like weak ACLs.

[1] https://docs.microsoft.com/en-us/azure/architecture/patterns...

[+] gizmo686|7 years ago|reply

This line of code is very suspicious:

    CartID cartId = new CartID(productId.toString());

This line of code looks perfectly fine:

    addToCart(..., productId, ...);

There is tremendous benefit to making wrong code obviously wrong.

Not to mention that even if you do not have formally verifiable structure, you can almost always do sanity checks.

[+] raquo|7 years ago|reply

You can get very far by just containing unsafety. In this example, you shouldn't need to sprinkle `new CardID` calls throughout your whole application. You will only use them when deserializing values from a typeless source (e.g. AJAX response), and that part should be contained to some AjaxService.

[+] sgt101|7 years ago|reply

I think that this is a smoking gun where the user has purposefully violated the typing that's been introduced to stop errors. It's like altering tests so your code passes - basically sabotage.

[+] gcommer|7 years ago|reply

> new CartID(productID.toString());

With good data modeling and safety checks, this should fail: a product id and cart id should have some sort of distinguishing syntax (eg "P1234" vs "C1234") and so `new CartID("P1234")` will fail.

At some point, no sorts of type system can stop incorrectly implemented/specified logic. An even simpler example would be just passing some constant, valid, but not semantically correct string to the CartID constructor.

Your (a) and (b) solutions don't solve this. I could just as easily create a meaningless Cart object, or a meaningless ProductAddedEvent the same way you constructed that meaningless CartID.

[+] pseudonom-|7 years ago|reply

In some languages, you can ensure that no `.toString()` equivalent is available. Then, your ID is truly opaque and can only be used in the prescribed ways.

[+] sbov|7 years ago|reply

> (a) Force the caller to resolve the IDs to actual entities before invoking the method:

This doesn't really fix the potential problem you pointed out. All it turns it into is this:

> Cart cart = lookupCart(productId);

With the popularity of auto increment, that above code will likely return a valid cart.

[+] 0x445442|7 years ago|reply

I came to say the same thing. I appreciated the blog post but the example was overly contrived. The addToCart method should have been passed the objects the ids were representing and should not have been responsible (even via delegation) for resolving the ids.

[+] chriswarbo|7 years ago|reply

I would say the unsafe thing here is `new CartID`, which allows you to turn an arbitrary string into a `CartID`. That shouldn't be possible outside of a small, encapsulated module somewhere, which exposes only safe usages (e.g. wrappers around database calls).

One way to do this is via a module system, like in Racket, ML, Haskell, etc. I'm not sure if Java et al. allow private constructors, but if so we could use those and provide the safe wrappers as static methods.

An alternative/complementary approach is to make `CartID` abstract, e.g. an existential type in ML or an interface in Java. This way our code is forced to be generic, working for any possible implementation of `CartID`. We specialise it to some sort of `StringCartID` implementation once, at the top-level.

[+] iovrthoughtthis|7 years ago|reply

This. Doing actions against an id is a shortcut. Deserialise the object from the db so you can make changes and verify integrity before serialization.

Also allows you to leverage the type of object your playing with.

No need for CartId types if your passing around Cart types. Which you probably already have.

[+] tetha|7 years ago|reply

Another interesting and, retrospectively, obvious idea I recently read about: Have a class 'CleartextPassword' around, which overrides all string-serialization as either a sequence of *s, or a hashcode of the password. Suddenly the type system prevents security issues.

[+] Rapzid|7 years ago|reply

Scott Wlaschin, of fsharp for fun and profit, has written/presented[1] quite a bit about utilizing the type system to prevent invalid states. Diff lang, but sim concepts. It was an epiphany to me and a concept I have at least tried to bring back to the ALGOL side where possible.

[1]https://fsharpforfunandprofit.com/series/designing-with-type...

[+] Gibbon1|7 years ago|reply

I've though would be nice if languages had a 'secure' type qualifier which prevents leakage when something goes out of scope. AKA, compiler will issue code to zero out the data in memory once it's out of scope.

Extending that, passing a secure type to an insecure function should result in a warning at least.

   log_printf( password.cleartext()); // error

   log_printf( password.safehash()); // okay

[+] 49bc|7 years ago|reply

Cool idea. However It’s the DB that usually leaks the password, not the stack*

Assuming we’re talking about Scala and not C.

[+] danpalmer|7 years ago|reply

The other great motivating example I’ve seen is an Amount type parametrised by currency as well as the numerical value, so that the type system can enforce that you don’t add two amounts of different currencies, or that to add, you have to also provide a correctly typed exchange rate value.

[+] lihaoyi|7 years ago|reply

I wrote a blog post that goes into a lot more detail into some of these techniques, in case anyone wants to dive deeper

- http://www.lihaoyi.com/post/StrategicScalaStylePracticalType... - http://www.lihaoyi.com/post/StrategicScalaStyleDesigningData...

[+] bgirard|7 years ago|reply

I suggested a similar trick in Gecko when we were seeing too many bugs in the rendering/async scrolling code because of mistakes when transforming between the different unit spaces:

https://dxr.mozilla.org/mozilla-central/rev/4303d49c53931385...

Transformations between units are also typed so you wont accidentally use the wrong scale factor when transforming.

[+] grosjona|7 years ago|reply

The problem with complex type systems is that they force you to spend a lot of your development time thinking about type structures and their relationships (in a universal way) instead of thinking about actual logic (in a more localized way).

For example, when you introduce third party modules into your code, sometimes it's impossible to cleanly reconcile the type structures exposed by the module with the type structures within your own system... You may end up with essentially identical concepts being classified as completely different types/classes/interfaces... It creates an unnecessary impedance mismatch between your logic and the logic exposed by third-party modules.

This type impedance mismatch may discourage developers from using third-party modules altogether. That might explain why dynamically typed languages like JavaScript and Ruby have such rich third-party module ecosystems compared to that of statically typed languages.

Unless there is a single standardized and universally accepted type system which is fully consistent across the minds of all developers of a specific language/platform (for all possible applications), then a type system makes no sense at all; especially a complex one.

[+] hardwaresofton|7 years ago|reply

Yeah this is almost completely wrong IMHO, could you give some concrete examples of when this happened to you or someone you know?

I'm writing a relatively small haskell app, and though I've had large swaths of time where I had to think about the types I was writing exclusively, I almost always come out of the 30 minutes or so understanding my code, and what I'm trying to do so much better.

Forcing you to spend time thinking about type structures and their relationships is (dare I say) the essence of programming. Programming is in the end about transforming data for some useful end (and some side effects along the way), and no one wants to work with 0s and 1s directly.

Integrating with third party modules in haskell is no more painful than integrating with 3rd party modules in Java, in fact is way way way simpler, given Java's infamous verbosity and love of enterprise patterns (tm).

[+] whateveracct|7 years ago|reply

Those impedance mismatches are equally present in dynamically typed code. The only thing types do is help make it obvious when there are problems with composition.

I work in Haskell with many other Haskell developers and a wide variety of Haskell mindsets. It’s a very complex type system we use a lot of, and so far, the issues you mention are so minor and worth it.

[+] CharlesMerriam2|7 years ago|reply

There is often an I love {Scala|Rust|Erland} post showing an simple type issue and claiming that using a bunch of new syntax would somehow fix it.

<rant>

1. I want a minimum of typed syntax for safety. For example, remember Hungarian Notation? My linter barfs if I try "icCart = ipMyProduct" because of the semantic type issue.

2. I want a full set of primitives. For example, counts (c) are 0 or above, not infinite, can do arithmetic) while ids (i) are positive, not infinite, cannot do arithmetic, cannot assign constant). I want optionals; I want clearer error handling; I want fully declared I/O instead of manual error handling; I want better.

3. I want tallies of how often simple types are confused before changing languages to fix. My type confusions are complicated nested collections or variant records.

</rant>

Typing is a means for correct programming, not an end in itself.

[+] GavinMcG|7 years ago|reply

This got me thinking, and I'm curious what a language would look like if basic types weren't directly usable.

In other words, maybe type checking doesn't go far enough: checking that I'm getting an int is less valuable than checking that I'm getting the right sort of data for the domain.

[+] ghayes|7 years ago|reply

This might be a good time to check out algebraic data types, esp. with a single constructor. For example, in Elm you might declare:

    type Length = Inches Int

Then any function that accepts a Length, you can only pass in a length. This effectively acts like tagged tuples verified at compile time. You can later extend this type to be:

    type Length = Inches Int | Meters Int

and any code that reads length would have to handle both cases (also checked at compile that).

I have found this to be a great way of ensuring all of your types are correct (and sometimes, a great frustration of wishing I didn't have to wrap/unwrap types so often).

[+] usrusr|7 years ago|reply

Whenever I play with that idea in my head I get hung up on the question wether one could go even further down that road and drop variable names as a concept separate from types. If you already have a Surname instead of a String and a Birthday instead of a Date, why duplicate that into surname and bday? In a way, variable names are a bit like informal domain subtypes. Could we make them formal?

I'm really not sure if that could be in any way practical at all, but I like the idea.

[+] JoshTriplett|7 years ago|reply

If you did that, in what language would you write the validation code to check if you have a value in the correct domain, or convert between domains?

[+] GavinMcG|7 years ago|reply

I think this post [0] points out how Haskell can implement this sort of thing, using DataKinds, and GeneralizedNewTypeDeriving with StandaloneDeriving.

[0] https://lexi-lambda.github.io/blog/2016/06/12/four-months-wi...

[+] tzahola|7 years ago|reply

See Boost.Units: https://www.boost.org/doc/libs/1_65_0/doc/html/boost_units.h...

It enforces correct units/dimensions at compile time, so you won’t be able to mix up length with time or area, etc.

[+] daxfohl|7 years ago|reply

unsigned int?

[+] oherrala|7 years ago|reply

I wrote about the same topic in

https://medium.com/sensorfu/using-static-typing-to-protect-a...

Type systems are really good in helping to avoid this problem!

[+] platz|7 years ago|reply

Make illegeal states unrepresentable

[+] dfan|7 years ago|reply

I have a system set up in C++ to make it trivial to add new unique kinds of IDs (wrappers around ints) as well as vectors that can be indexed only with that kind of ID. So you can have a ShoeVec containing Shoes that is indexable only by ShoeIDs and a ShirtVec containing Shirts that is indexable only by ShirtIDs. It's zero-overhead in optimized builds, of course. You'd be surprised how rarely you have to convert to or from the underlying type.

[+] jdonaldson|7 years ago|reply

Fwiw the Haxe language calls this an "abstract type", and they're very useful.

https://haxe.org/manual/types-abstract.html

[+] kazinator|7 years ago|reply

Congrats, you almost have Pascal there.

[+] hzhou321|7 years ago|reply

The suggested type system adds a lot of code and complexity. For the specific case, I suggest we can try a convention-based solution. Let's, eg. establish a convention that all product id variable named with `prod_` prefix and customer id variable named with `cust_` prefix. Then a simple static analyzer filter can serve as the unit test to catch mistakes (include passing in variable names that don't conform the convention). We should recognize that the enforced prefix serves the same role as the type system but at the preprocessing stage rather than the compiler stage. It does not require the programmer to add any extra code; rather it helps one of the difficult problems of naming things. Unlike the type system, it is easy to adopt strong convention (the strongest being a set of fixed names) and then relax the convention as the program evolves. In contrast, the type system often takes the opposite path of from relax to strict with steep increasing cost. Of course, unlike the type system, the enforcement of the convention does not come automatically -- the programmer has to write static analysis code to enforce it -- this is fundamentally not different from writing unit tests. But the static analysis code is easy to write and less likely to have bugs (and bugs have less severe consequences). I use a general purpose preprocessor, MyDef, and every line of my code goes through several places of filtering anyway, so adding a set of static checks seem trivial. But even you don't use a preprocessor, implementing a simple static convention checker (to be enforced at repository check-in) doesn't seem difficult.

[+] billfruit|7 years ago|reply

One thing that I personally would find useful is a F# like Unit of Measure, kind of type metadata, that allows units of quantities to be explicitly mentioned, and there by help in preventing unit confusions in code, ie passing around values in the wrong kind of units, esp degrees and radians for eg, can often be a difficult mistake to catch.

[+] hardwaresofton|7 years ago|reply

If you like this, you're probably also going to like Haskell/OCaml/Ada/F#/Coq (ML family of languages?):

I'll leave links for Haskell since the only one of those I've tried and liked personally:

https://www.haskell.org/

http://learnyouahaskell.com/

IMO Haskell is the type system you wanted Java to have, with even more strictness. While it's possible to write unsafe haskell, just about everything you will read, and the language itself encourages you to push as many worries as you can to be solved by the type system.

If that doesn't get you excited, there's actually a saying that is mostly true in my experience writing Haskell -- "If it compiles, it works"(tm).

Also Haskell on the JVM if you want to dip your toes in:

- https://eta-lang.org/

- https://github.com/Frege/frege

[EDIT] - If you try to learn haskell you'll inevitably run into the M word which rhymes with Gonad. The best guide I've ever read is @ http://adit.io/posts/2013-04-17-functors,_applicatives,_and_...

BTW, be ye warned: when you start understanding/getting cozy with the concepts introduced by Haskell you'll wish they were everywhere, and they're not. Languages without proper union types will look dumb to you, easy function composition/points free style will be a daily desire, and monadic composition of a bunch of functions will be frustratingly absent/ugly in other languages, not doing errors-as-values will look ironically error-prone.

[EDIT 2] - More relevant to the actual post, how you would do this in Haskell is using the `type` keyword (type aliasing, e.g. `type Email = String`) or the `newtype` keyword, which creates a new type that is basically identical (what the article talks about). Here's some discussion on newtype:

- http://degoes.net/articles/newtypes-suck (this article is good, it describes the premise, promise, shortcomings, and workarounds, forgive the clickbaity title)

- https://www.reddit.com/r/haskell/comments/4jnwjg/why_use_new...

[+] Silhouette|7 years ago|reply

Although Haskell has a relatively powerful and useful type system, you are perhaps being a little kind to it here. Haskell also has partial functions, including quite a few dangerous examples in the standard library, various mechanisms to escape the type system entirely, and 27 different ways to handle errors, 28 of which look a bit like exceptions but need handling in 29 different ways. And it still can't handle basic record types very elegantly, nor dependent types such as fixed-size arrays/matrices. Haskell's type system does have a lot going for it and it also makes a useful laboratory for experiments with type-based techniques, but as far as safety goes it has never really lived up to the hype, and for all its advanced magic, it is surprisingly lacking in support for some basic features that are widely available in other languages.

[+] YouAreGreat|7 years ago|reply

> points free style will be a daily desire

Not...really.

[+] andreygrehov|7 years ago|reply

I believe a lot of similar concepts are explained in Domain Driven Design.

[+] dclowd9901|7 years ago|reply

What really put me on to typing JavaScript was this. The ability to avoid unit tests. And frankly what unit test would ever even capture something like this, an implementation issue, (unless you refactored a lib).

[+] clintonb|7 years ago|reply

> Don’t get me wrong I’m not saying to scrap all your test suites and try to encode all your constraints using the type system instead. Not at all I still firmly believe that tests are useful but in some cases you can leverage the type system to avoid mistakes.

You should still write unit tests. You should also write integration/functional tests that might have caught this error when you tested the code that calls the function in question. The only scenario I can see the argument swapping bug not being caught is when all of the ID arguments equal the same value.

[+] 49bc|7 years ago|reply

I’m seeing a little bit of a waterbed effect here. The author has replaced the basic type with one that matches it in name. You will never accidentally send CustomerId to CartID because the compiler will fail... but now we’ll nees to dig through the code to find out whether customer ID should be an int, int32, String.... what the heck is it anyway?

In scala can I not just say foo(item1=item1) even if foo takes only 1 arg?

[+] rurban|7 years ago|reply

The id type parameter is not called phantom type, it's called generic type.

[+] tzahola|7 years ago|reply

Even though people love bashing Objective-C for its “weird” method syntax, it trivially circumvents these kind of problems.

Also related: Mike Ash’s post about using single-member C structs instead of unitless scalar types: https://www.mikeash.com/pyblog/friday-qa-2013-08-02-type-saf...

[+] thankthunk|7 years ago|reply

Leverage the type system? Type systems exists to allow you catch errors at compile time. That is what it's there to do. It's like saying leveraging the compiler to compile or leveraging the cup to drink water.

116 comments