top | item 46961759

(no title)

Maybe I'm missing something and I'm glad this idea resonates, but it feels like sometime after Java got popular and dynamic languages got a lot of mindshare, a large chunk of the collective programming community forgot why strong static type checking was invented and are now having to rediscover this.

In most strong statically typed languages, you wouldn't often pass strings and generic dictionaries around. You'd naturally gravitate towards parsing/transforming raw data into typed data structures that have guaranteed properties instead to avoid writing defensive code everywhere e.g. a Date object that would throw an exception in the constructor if the string given didn't validate as a date (Edit: Changed this from email because email validation is a can of worms as an example). So there, "parse, don't validate" is the norm and not a tip/idea that would need to gain traction.

discuss

pjerem|20 days ago

> In most strong statically typed languages, you wouldn't often pass strings and generic dictionaries around.

In 99% of the projects I worked on my professional life, anything that is coming from an human input is manipulated as a string and most of the time, it stays like this in all of the application layers (with more or less checks in the path).

On your precise exemple, I can even say that I never saw something like an "Email object".

jghn|20 days ago

I've seen a mix between stringly typed apps and strongly typed apps. The strongly typed apps had an upfront cost but were much better to work with in the long run. Define types for things like names, email address, age, and the like. Convert the strings to the appropriate type on ingest, and then inside your system only use the correct types.

FranklinJabar|20 days ago

> On your precise exemple, I can even say that I never saw something like an "Email object".

Well that's.... absolutely horrifying. Would you mind sharing what industry/stack you work with?

hathawsh|20 days ago

Python has an "email object" that you should definitely use if you're going to parse email messages in any way.

https://docs.python.org/3/library/email.message.html

I imagine other languages have similar libraries. I would say static typing in scripting languages has arrived and is here to stay. It's a huge benefit for large code bases.

tracker1|20 days ago

What's funny, is this is exactly one of the reasons I happen to like JavaScript... at its' core, the type coercion and falsy boolean rules work really well (imo) for ETL type work, where you're dealing with potentially untrusted data. How many times have you had to import a CSV with a bad record/row? It seems to happen all the time, why, because people use and manually manipulate data in spreadsheets.

In the end, it's a big part of why I tend to reach for JS/TS first (Deno) for most scripts that are even a little complex to attempt in bash.

Thaxll|20 days ago

Trying to parse email will result in bad assumptions. Better be a plain string than a bad regex.

For examples many website reject + character, which is totally valid and gmail uses that for temporary emails.

Same for adresses.

rileymichael|20 days ago

this is likely an ecosystem sort of thing. if your language gives you the tools to do so at no cost (memory/performance) then folks will naturally utilize those features and it will eventually become idiomatic code. kotlin value classes are exactly this and they are everywhere: https://kotlinlang.org/docs/inline-classes.html

mattmanser|20 days ago

Clearly never worked in any statically typed language then.

Almost every project I've worked on has had some sort of email object.

Like I can't comprehend how different our programming experiences must be.

Everything is parsed into objects at the API layer, I only deal with strings when they're supposed to be strings.

Boxxed|20 days ago

Well that's terrifying

eptcyka|20 days ago

My condolences, I urge you to recover from past trauma and not let it prohibit a happy life.

krick|20 days ago

At first I had a negative reaction to that comment and wanted to snap back something along the lines of "that's horrible" as well, but after thinking for a while, I decided that if I have anything to contribute to the discussion, I have to kinda sorta agree with you, and even defend you.

I mean, of course having a string, when you mean "email" or "date" is only slightly better than having a pointer, when you mean a string. And everyone's instinctive reaction to that should be that it's horrible. In practice though, not only did I often treat some complex business-objects and emails as strings, but (hold onto yourselves!) even dates as strings, and am ready to defend that as the correct choice.

Ultimately, it's about how much we are ready to assume about the data. I mean, that's what modelling is: making a set of assumptions about the real world and rejecting everything that doesn't fit our model. Making a neat little model is what every programmer wants. It's the "type-driven design" the OP praises. It's beautiful, and programmers must make beautiful models and write beautiful code, otherwise they are bad programmers.

Except, unfortunately, programming has nothing to do with beauty, it's about making some system that gets some data from here, displays it there and makes it possible for people and robots to act on the given data. Beautiful model is essentially only needed for us to contain the complexity of that system into something we can understand and keep working. The model doesn't truly need t be complete.

Moreover, as everyone with 5+ years of experience must known (I imagine), our models are never complete, it always turns out that assumptions we make are naïve it best. It turns out there was time before 1970, there are leap seconds, time zones, DST, which is up to minutes, not hours, and it doesn't necessarily happen on the same date every year (at least not in terms of Gregorian calendar, it may be bound to Ramadan, for example). There are so many details about the real world that you, brave young 14 (or 40) year old programmer don't know yet!

So, when you model data "correctly" and turn "2026-02-10 12:00" (or better yet, "10/02/2026 12:00") into a "correct" DateTime object, you are making a hell lot of assumptions, and some of them, I assure you, are wrong. Hopefully, it just so happens that it doesn't matter in your case, this is why such modelling works at all.

But what if it does? What if it's the datetime on a ticket that a third party provided to you, and you are providing it to a customer now? And you get sued if it ends up the wrong date because of some transformations that happened inside of your system? Well, it's best if it doesn't happen. Fortunately, no other computations in the system seem to rely on the fact it's a datetime right now, so you can just treat it as a string. Is it UTC? Event city timezone? Vendor HQ city timezone? I don't know! I don't care! That's what was on the ticket, and it's up to you, dear customer, to get it right.

So, ultimately, it's about where you are willing to put the boundary between your model and scary outer world, and, pragmatically, it's often better NOT to do any "type-driven design" unless you need to.

masklinn|20 days ago

> it feels like sometime after Java got popular [...] a large chunk of the collective programming community forgot why strong static type checking was invented and are now having to rediscover this.

I think you have a very rose-tinted view of the past: while on the academic side static types were intended for proof on the industrial side it was for efficiency. C didn't get static types in order to prove your code was correct, and it's really not great at doing that, it got static types so you could account for memory and optimise it.

Java didn't help either, when every type has to be a separate file the cost of individual types is humongous, even more so when every field then needs two methods.

> In most strong statically typed languages, you wouldn't often pass strings and generic dictionaries around.

In most strong statically typed languages you would not, but in most statically typed codebases you would. Just look at the Windows interfaces. In fact while Simonyi's original "apps hungarian" had dim echoes of static types that got completely washed out in system, which was used widely in C++, which is already a statically typed language.

guerrilla|20 days ago

> I think you have a very rose-tinted view of the past

I think they also forgot the entire Perl era.

chriswarbo|20 days ago

> You'd naturally gravitate towards parsing/transforming raw data into typed data structures that have guaranteed properties instead to avoid writing defensive code everywhere e.g. a Date object that would throw an exception in the constructor if the string given didn't validate as a date

It's tricky because `class` conflates a lot of semantically-distinct ideas.

Some people might be making `Date` objects to avoid writing defensive code everywhere (since classes are types), but...

Other people might be making `Date` objects so they can keep all their date-related code in one place (since classes are modules/namespaces, and in Java classes even correspond to files).

Other people might be making `Date` objects so they can override the implementation (since classes are jump tables).

Other people might be making `Date` objects so they can overload a method for different sorts of inputs (since classes are tags).

I think the pragmatics of where code lives, and how the execution branches, probably have a larger impact on such decisions than safety concerns. After all, the most popular way to "avoid writing defensive code everywhere" is to.... write unsafe, brittle code :-(

munificent|20 days ago

> You'd naturally gravitate towards parsing/transforming raw data into typed data structures that have guaranteed properties instead to avoid writing defensive code everywhere e.g.

There's nothing natural about this. It's not like we're born knowing good object-oriented design. It's a pattern that has to be learned, and the linked article is one of the well-known pieces that helped a lot of people understand this idea.

thom|20 days ago

My experience was that enterprise programmers burned out on things like WSDL at about the same time Rails became usable (or Django if you’re that way inclined). Rails had an excellent story for validating models which formed the basis for everything that followed, even in languages with static types - ASP.NET MVC was an attempt to win Rails programmers back without feeling too enterprisey. So you had these very convenient, very frameworky solutions that maybe looked like you were leaning on the type system but really it was all just reflection. That became the standard in every language, and nobody needed to remember “parse don’t validate” because heavy frameworks did the work. And why not? Very few error or result types in fancy typed languages are actually suited for showing multiple (internationalised) validation errors on a web page.

The bitter lesson of programming languages is that whatever clever, fast, safe, low-level features a language has, someone will come along and create a more productive framework in a much worse language.

Note, this framework - perhaps the very last one - is now ‘AI’.

noelwelsh|20 days ago

In 2 out of 3 problematic bugs I've had in the last two years or so were in statically typed languages where previous developers didn't use the type system effectively.

One bug was in a system that had an Email type but didn't actually enforce the invariants of emails. The one that caused the problem was it didn't enforce case insensitive comparisons. Trivial to fix, but it was encased in layers of stuff that made tracking it down difficult.

The other was a home grown ORM that used the same optional / maybe type to represent both "leave this column as the default" and "set this column to null". It should be obvious how this could go wrong. Easy to fix but it fucked up some production data.

Both of these are failures to apply "parse, don't validate". The form didn't enforce the invariants it had supposedly parsed the data into. The latter didn't differentiate two different parsing.

rzwitserloot|20 days ago

that's a bit of a hairy situation. You're doing it wrong. Or not really, but.. complicated.

As per [RFC 5321](https://www.rfc-editor.org/rfc/rfc5321.html):

> the local-part MUST be interpreted and assigned semantics only by the host specified in the domain part of the address.

You're not allowed to do that. The email address `foo@bar.com` is identical to `foo@BAR.com`, but not necessarily identical to `FOO@bar.com`. If we're going to talk about 'commonly applied normalisations at most email providers', where do you draw that line? Should `foo+whatever@bar.com` be considered equal to `foo@bar.com`? That souds weird, except - that is exactly how gmail works, a couple of other mail providers have taken up that particular torch, and if your aim is to uniquely identify a 'recipient', you can hardcode that `a@gmail.com` and `a+whatever@gmail.com` definitely, guaranteed, end up at the same mailbox.

In practice, yes, users _expect_ that email addresses are case insensitive. Not just users, even - various intermediate systems apply the same incorrect logic.

This gets to an intriguing aspect of hardcoding types: You lose the flex, mostly. types are still better - the alternative is that you reliably attempt to write the same logic (or at least a call to some logic) to disentangle this mess every time you do anything with a string you happen to know is an email address which is terrible but gives you the option of intentionally not doing that if you don't want to apply the usual logic.

That's no way to program, and thus actual types and the general trend that comes with it (namely: We do this right, we write that once, and there is no flexibility left). Programming is too hard to leave room for exotic cases that programmers aren't going to think about when dealing with this concept. And if you do need to deal with it, it can still be encoded in the type, but that then makes visible things that in untyped systems are invisible (if my email type only has a '.compare(boolean caseSensitive)' style method, and is not itself inherently comparable because of the case sensitivity thing, that makes it _seem_ much more complicated than plain old strings. This is a lie - emails in strings *IS* complicated. They just are. You can't make that go away. But you can hide it, and shoving all data in overly generic data types (numbers and strings) tends to do that.

bcrosby95|20 days ago

In my experience that's pretty rare. Most people pass around string phone numbers instead of a phonenumber class.

Java makes it a pain though, so most code ends up primitive obsessed. Other languages make it easier, but unless the language and company has a strong culture around this, they still usually end up primitive obsessed.

vips7L|20 days ago

    record PhoneNumber(String value) {}

Huge pain.

css_apologist|20 days ago

This is an idea that is not ON or OFF

You can get ever so gradually stricter with your types which means that the operations you perform on on a narrow type is even more solid

It is also 100% possible to do in dynamic languages, it's a cultural thing

wat10000|20 days ago

I'm not sure, maybe a little bit. My own journey started with BASIC and then C-like languages in the 80s, dabbling in other languages along the way, doing some Python, and then transitioning to more statically typed modern languages in the past 10 years or so.

C-like languages have this a little bit, in that you'll probably make a struct/class from whatever you're looking at and pass it around rather than a dictionary. But dates are probably just stored as untyped numbers with an implicit meaning, and optionals are a foreign concept (although implicit in pointers).

Now, I know that this stuff has been around for decades, but it wasn't something I'd actually use until relatively recently. I suspect that's true of a lot of other people too. It's not that we forgot why strong static type checking was invented, it's that we never really knew, or just didn't have a language we could work in that had it.

Archelaos|20 days ago

Strong static type checking is helpful when implementing the methodology described in this article, but it is besides its focus. You still need to use the most restrictive type. For example, uint, instead of int, when you want to exclude negative values; a non-empty list type, if your list should not be empty; etc.

When the type is more complex, specific contraints should be used. For a real live example: I designed a type for the occupation of a hotel booking application. The number of occupants of a room must be positiv and a child must be accompanied by at least one adult. My type Occupants has a constructor Occupants(int adults, int children) that varifies that condition on construction (and also some maximum values).

lelanthran|19 days ago

> The number of occupants of a room must be positiv and a child must be accompanied by at least one adult. My type Occupants has a constructor Occupants(int adults, int children) that varifies that condition on construction (and also some maximum values).

Or, you could do what I did when faced with a similar problem - I put in a PostgreSQL constraint.

Now, no matter which application, now or in the future, attempts to store this invalid combination, it will fail to store it.

Doing it in code is just asking for future errors when some other application inserts records into the same DB.

Business constraints should go into the database.

imtringued|20 days ago

Using uint to exclude negative values is one of the most common mistakes, because underflow wrapping is the default instead of saturation. You subtract a big number from a small number and your number suddenly becomes extremely large. This is far worse than e.g. someone having traveled a negative distance.

yakshaving_jgt|20 days ago

It's a design choice more than anything. Haskell's type safety is opt-in — the programmer has to actually choose to properly leverage the type system and design their program this way.

renox|19 days ago

I worked (a long time ago) on a C project where every int was wrapped in a struct. And a friend told me about a C++ project where every index is a uint8, uint16, and they have to manage many different type of objects leading to lots of bugs.. So it isn't really linked to the language.

jackpirate|20 days ago

> Edit: Changed this from email because email validation is a can of worms as an example

Email honestly seems much more straightforward than dates... Sweden had a Feb 30 in 1712, and there's all sorts of date ranges that never existed in most countries (e.g. the American colonies skipped September 3-13 in 1752).

legulere|19 days ago

It’s a ISO-standard to use Gregorian dates even for dates predating its invention. If you need to support anything else (I never had to in my Eurocentric work so far), you’ll need to model calendars, similar to how temporal did for JavaScript: https://tc39.es/proposal-temporal/docs/calendars.html

flqn|20 days ago

Dates are unfortunate in that you can only really parse them reliably with a TZDB.

conartist6|20 days ago

I think you're quite right that the idea of "parse don't validate" is (or can be) quite closely tied to OO-style programming.

Essentially the article says that each data type should have a single location in code where it is constructed, which is a very class-based way of thinking. If your Java class only has a constructor and getters, then you're already home free.

Also for the method to be efficient you need to be able to know where an object was constructed. Fortunately class instances already track this information.

jiehong|19 days ago

And then clojure enters: let’s keep few data structures but with tons of method.

So things stay as maps or arrays all the way through.

brooke2k|20 days ago

this is very much a nitpick, but I wouldn't call throwing an exception in the constructor a good use of static typing. sure, it's using a separate type, but the guarantees are enforced at runtime

munificent|20 days ago

I wouldn't call it a good use of static typing, but I'd call it a good use of object-oriented programming.

This is one of the really key ideas behind OOP that tends to get overlooked. A constructor's job is to produce a semantically valid instance of a class. You do the validation during construction so that the rest of the codebase can safely assume that if it can get its hands on a Foo, it's a valid Foo.

zanecodes|20 days ago

Given that the compiler can't enforce that users only enter valid data at compile time, the next best thing is enforcing that when they do enter invalid data, the program won't produce an `Email` object from it, and thus all `Email` objects and their contents can be assumed to be valid.

imtringued|20 days ago

I agree and for several reasons.

If you have onerous validation on the constructor, you will run into extremely obvious problems during testing. You just want a jungle, but you also need the ape and the banana.