This article, despite clocking in at nearly 10,000 words, appears to have been written in only two days. In fact the author has written five engineering articles in the last week, seeming to average around 5000+ words per article. What a spurt of productivity! Coincidentally, the article happens to be filled with pithy headers and constant sensationally dramatic contrasts that are out of place in long-form technical writing, but which LLMs are known to spam ad nauseum because they are effective for hooking attention in clickbait rags. Curious, that.
Author here: I have a bunch of drafts that I haven't gotten around to publishing for quite some time, and I'm on parental leave, which affords me a little more time than usual to get things published. I don't have a lot of additional ideas left in the tank.
1. You literally don't know when/how many days the article was written in, so not sure what that "written in only 2 days!" is about unless you're counting the days from their last post. You don't seem to understand the concept of drafts.
2. "What a spurt of productivity!" = a petty personal snide that's not in good faith at all.
3. You provide nothing in terms of what this article gets wrong or the technicalities of it. If you want to discuss flaws with the article, be specific instead of "this screams AI! clickbait rags!"
As someone who works with a LOT of AI-generated content, I'm pretty sure this is not AI-written.
It would explain why the information density is so low. I got about halfway through hoping for something clever, then started scanning, then just gave up.
10k words for "migrating code and data is a headache". Yeah. Next.
I mean, I am dealing with much of the same problems mentioned in the article, and I found it super enlightening. It was neat to read about undecidability of the general problem of version updates, about the two-version window, and some of the solutions folks have come up with.
The challenges mentioned on the article are challenges that are known to every larger distributed system. They are not challenges solved or to be solved by functional programming.
But the solutions and tools functional programming provides help to have higher verifiability, fewer side effects and more compile-time checks of code in the deployment unit. That other challenges exists is fully correct. But without a functional approach, you have these challenges plus all others.
So while FP is not a one-size-fits-all solution to distributed systems challenges, it does help solve a subset of the challenges on a single system level.
> They are not challenges solved or to be solved by functional programming.
These challenges can be solved by the usual tools of FP, but this requires each version of the system to be explicitly aware of the data schema varieties used by all earlier versions that are still in use. Then it's a matter of interpreting earlier-versioned data correctly, and translating data to earlier schema varieties whenever they may have to be interpreted by earlier versions of the code.
(It may also be helpful to introduce minimally revised releases of the earlier codes, that simply add some amount of forward compatibility re: dealing with later-released schemas, while broadly keeping all other behavior the same to avoid unwanted breakage. These approaches are not too hard to implement.)
This. The thesis statement is so confused and I cannot tell if it's deliberate clickbait or the author legitimately is that confused about the distinction between program-level language features and program structure and systems design and how they are two closely related but really distinct things.
This all sounds very clever, but in practice no, most of the problems are code problems and if you do functional programming you really can avoid getting paged at 3AM because everything broke. (Even if you don't solve the deployment problem the article talks about - you're not deploying at 3AM!). It's like those 400IQ articles about how Rust won't fix all your security problems because it can't catch semantic errors, whereas meanwhile in the real world 70% of vulnerabilities are basic memory safety issues.
And doing the things the article suggests would be mostly useless, because the problem is always the boundary between your neat little world and the outside. You can't solve the semantic drift problem by being more careful about which versions of the code you allow to coexist, because the vast majority of semantic drift problems happen when end users (or third parties you integrate with) change their mind about what one of the fields in their/your data model means. There are actually bitemporal database systems that will version the code as well as the data, that allow you to compute what you thought the value was last Thursday with the code you were running last Thursday ("bank python"). It solves some problems, but not as many as you might think.
> It's like those 400IQ articles about how Rust won't fix all your security problems because it can't catch semantic errors, whereas meanwhile in the real world 70% of vulnerabilities are basic memory safety issues.
even setting aside the memory safety stuff, having sum types and compiler-enforced error handling is also not nothing...
I went into this expecting it to be a criticism in the context of low-level systems, and was a bit surprised when it ended up being about distributed. The mismatch the author is describing, I think, is not really about functional programmers so much as everyone not working directly on modern distributed or web software. For being upfront about all the ways distributed programming is different, this is actually one of the best intros I have seen for the non-distributed programmer on what distributed programming is really like.
I still want to believe that future programming languages will be capable of tackling these issues, but exploring the design space will require being cognizant of those issues in the first place.
> Static types, algebraic data types, making illegal states unrepresentable: the functional programming tradition has developed extraordinary tools for reasoning about programs
Looks like the term "functional programming" has been watered down so much that now it is as useful as OOP: not at all.
Look, what matters is pure functional programming. It's about referential transparency, which means managing effects and reason about code in a similar way you can do with math. Static typing is very nice but orthogonal, ADT and making illegal states unrepresentable are good things, but all orthogonal.
An awesome article (if only we remove "functional programmers", feel like they are unfairly called out), hit close to home on operations and pull up a lot of (painful) memories.
Fundamentally - all your system after a certain scale becomes distributed systems, accuracy on a local level is usually not where systemic risks are coming from. A distributed system of fully accurate and correct local programs can implode or explode. Migration, compatibility and system stability are surprisingly dependent on engineer experience till this day.
I often recommend developers learn and use at least one model checking language for this purpose. They’re really good at helping us reason about global properties of systems.
This isn’t unique to Haskell or functional programming at all. I’ve seen programmers from JavaScript to C++ suffer the same fate: tied into the boundaries of a single process they try to verify the correctness of their systems informally. By writing documents and holding long meetings or convincing themselves by prototyping. It’s only a matter of time before the find the error in production.
Hopefully it’s of no real consequence. The error gets fixed and it’s a week or two of extra work. But some times it triggers an outage that causes SLO’s to overshoot and SLAs to trigger which cost the company money. That outage could have potentially been avoided.
But it’s tough getting developers to see the benefit in model checking or any kind of formal methods. The hubris is often too great. It’s not just functional programmers! Although it surprised me when I was working with Haskell developers that so many were resistant to the idea.
(If one has experience and understands functional programming, learning TLA should be rather straight forward)
I also took a stab at the versioning challenge back in 2019 and wrote a library with a friend in a fevered evening for type safe migrations in Haskell [0].
In practice it’s a bit tricky to use because it doesn’t enforce encoding the schema identifier in the serialized payload. Oops.
But between that, automated typing from examples, and refinement it might be possible to develop a system that can manage to validate the serialized data against the parser and have the serialization data appear in the type. At that point you could query the whole system.
> If you have a web application with more than one server, you have a distributed system.
Even if you have a web application with only one server, and an embedded database, and no other components, you still have a distributed system. The browser is part of that system.
> Static types, algebraic data types, making illegal states unrepresentable: the functional programming tradition has developed extraordinary tools for reasoning about programs.
But none of these things are functional programming? This is more the tradition of 'expressive static types' than it is of FP.
What about Lisp, Racket, Scheme, Clojure, Erlang / Elixir...
> If you have a web application with more than one server, you have a distributed system. If you have background job workers, you have a distributed system. If you have a cron job, you have a distributed system. If you talk to Stripe, or send emails through SendGrid, or enqueue something in Redis, or write to a Postgres replica, then you are (I regret to inform you) operating a distributed system.
As soon as you have at least one client talking to your server you have a distributed system.
I feel like Erlang is a functional language that's honest about systems, not just a program. You are almost required to write distributed apps with Erlang, just because it's not weird to spin up thousands of Erlang processes, and you don't have any real way of knowing the order they're going to run so you have to build your applications around async messaging.
It makes me a little sad that Erlang/Elixir is kind of the only platform that does this though...
My experience with elixir, as a scrub who spends every day at work writing javascript, is pretty in line with that. The language forces you to work that way, and you spend half your time just architecting your supervision tree. But the language itself is so easy to write business logic in that it takes half as long as it would in another language. So it works out to the same total time investment but the return is so much higher cause your program is better and more predictable and has scaling for free.
>It makes me a little sad that Erlang/Elixir is kind of the only platform that does this though...
It's not. These are called green threads and under the hood it's not really threads. It's literally the same thing as async await in python and nodejs.
It's just different perspectives on the same concept of concurrency. Go and Erlang implicitly have calls to concurrency, while nodejs it's implicit.
What did I just read? I liked the bits explaining the basics of the difficulties of versioning data, well known points but explained in a simple way. But then it just kept going without any clear central thesis.
Was the central thesis supposed to be that it's impossible for any tools to ever represent complex migrating systems with well-defined data? Because that's not true. Or that it's not practical or cost-effective to do so? Or was there a central point?
You read a series of solid bullet points fleshed out semi-competently using AI, and given a slight human touch. It should be a quarter of this length at least.
It’s true. But static proof based checks can happen across system boundaries it’s just for historical and stupid reasons we misapply it here.
First put all your services in a monorepo have it all build as one under CI. That’s a static check across the entire system. When you do a polyrepo you are opening up a mathematical hole where two repos can be mismatched.
Second don’t use databases. Databases don’t type check and aren’t compatible with your functional code written on servers.
The first one is stupid… almost everyone just makes this mistake and most of the time unless you were there in the beginning to enforce monorepo you have to live with someone deploying a breaking change that effects a repo somehwere else. I’ve been in places where they simply didn’t understand the hole with polyrepos and they went with it despite me diagramming the issue on a white board (it’s that pervasive). Polyrepos exist for people who like organizing things with names and don’t understand static safety.
And it gets worse. So many people don’t understand that this problem only gets worse over time. The more repos you have in your polyrepo the more brittle everything becomes and they don’t even understand why.
The second one is also stupid but you had to be there in the very very beginning to have stopped it. The inception of the concept of databases should have been created such that any query you run on it needs to be compiled to run on the database and is thus statically checked. This is something that can easily be done but wasn’t done and now the entire World Wide Web just has to make do with testing. Hello? Type checking in application code but not in query code? Why?
That’s like half the problems with distributed systems completely solved. And these problems just exist because of raw stupidity. Stuff like race conditions and dead locks are solvable too but much harder. But these issues are obvious. What’s not obvious are the aforementioned problems that almost nobody talks about even though they are completely solvable.
> First put all your services in a monorepo have it all build as one under CI. That’s a static check across the entire system.
There are definitely benefits to this approach. My coworkers do fall into the trap of assuming that all the services will be deployed simultaneously, which is not something you can really guarantee (especially if you have to roll one of them back at some point), so the monorepo approach gives them confidence in some breaking changes that it shouldn't (like adding a new required field).
> First put all your services in a monorepo have it all build as one under CI. That’s a static check across the entire system.
That helps but is insufficient, since the set of concurrently deployed artifact versions can be different than any set of artifact versions seen by CI -- most obviously when two versions of the same artifact are actively deployed at the same time. It also appears to rule out the possibility of ever integrating with other systems (e.g., now you need to build your own Stripe-equivalent in a subdir of your monorepo).
> Second don’t use databases.
So you want to reimplement your own PostgreSQL-equivalent in another monorepo subdir too? I don't understand how opting not to use modern RDBMSes is practical. IIUC you're proposing implementing a DB using compiled queries that use the same types as the consuming application -- I can see the type-safety benefits, but this immediately restricts consumers to the same language or environment (e.g., JVM) that the DB was implemented in.
> Second don’t use databases. Databases don’t type check and aren’t compatible with your functional code written on servers.
That isn't very useful by itself. What's your suggested alternative that aligns with your advice of "don't"? How does it deal with destructive changes to data (e.g. a table drop)?
> The second one is also stupid but you had to be there in the very very beginning to have stopped it. The inception of the concept of databases should have been created such that any query you run on it needs to be compiled to run on the database and is thus statically checked. This is something that can easily be done but wasn’t done and now we all just make do with testing.
This is a good point, I've never really thought about this carefully before, but that makes so much sense.
This is really, really good as an overview of system-level issues. I write only to encourage others to read it.
Ideas that helped me situate many of the issues we struggled with at various companies:
- distinguishing programs (the code as designed) from (distributed) systems (the whole mess, including maintenance and migration, as deployed)
- versioning code (reversibly) and data (irreversibly)
- walk-through of software solutions for distributed systems, and their blind spots wrt data versioning; temporal databases
- semantic drift: schema-less change
- Limitations of type systems for managing invariants
- co-evolving code and data; running old code on old data
The real benefit of systems thinking is recognizing whether an issue is unsolvable and/or manageable (with the current approach, or warrants a technology upgrade or directional shift). It's like O(n) evaluation of algorithms, not for space/time but for issue tractability.
This is one of the main problems I have banging my head on for the past decade, and many of the things mentioned, like Buf, Unison, and more, I only stumbled upon randomly on my own. It's refreshing to read an article on this subject simply because it's so under-discussed. I also wonder to what degree these problems have been solved within the high walls of the tech giants like Google and Amazon.
What? Distributed systems 101 is to write workers that execute stateless, pure, functions and to manage state centrally (basically monads in yet another guise, fan in/fan outs) and to use functional operations to merge and transform data (map reduce).
There are a lot of reasons people don't succeed in systems thinking or in writing good distributed architectures, but functional programming is not one of them. If anything it's an inability to think functionally and to leak state and establish brittle dependencies absolutely everywhere that scuppers systems. Yes, FP has ton's of techniques that aid in good design at lower levels too, that doesn't mean it hampers you at high levels. In fact, much of what this article puts forward is a false dichotomy. None of the lower level FP principles and techniques like strong and expressive types systems are in conflict with the much more general principles of system design covered in the article (and they actually help you realize these principles—think version skew is bad in Haskell? good luck figuring out wtf is even happening in a tangled javascript async promises monstrosity undergoing version skew)
Also, slop, gross. At least edit some of that empty AI verbiage out of the final essay. It has the character and charm of a wet rag and adds zero information.
I really enjoyed this article—I meant to just read the first few paragraphs and skim the rest. I ended up reading every word.
The core thesis sticks with me: existing tooling that helps programmers think about and enforce correctness (linters, compilers, type systems, test suites) operate primarily (and often exclusively) on one snapshot of a project, as it exists at a particular point in time. This makes those tools, as powerful as they can be, unable to help us think about and enforce correctness across horizons that are not visible from the standpoint of a single project at a single point in time (systems distributed across time, across network, across version history).
I feel like the issue of semantic drift is a bigger topic that probably has roots beyond systems architecture/computer engineering.
Also I found the writing style very clear. It helpfully repeats the core ideas, but (IMHO, this is subjective) doesn't do so ad nauseum.
I'm interested in reading other writing from the same author.
> This makes those tools, as powerful as they can be, unable to help us think about and enforce correctness across horizons that are not visible from the standpoint of a single project at a single point in time (systems distributed across time, across network, across version history).
Eh?
I don't think there's something preventing you from constructing guardrails with type system & tests enforcing correctness that handles versioning.
I'm not buying the "unable to help us to think about" part. I argue that Rust's `Option`/`Result`/sum-type + exhaustive match/switch very valuable for proper error handling. You could define your own error type and exhaustively handle each case gracefully, with the bonus that now your compiler is also able to check it.
Outstanding piece. It articulates the actually fundamental correctness problem of most software development:
> “given the actual set of deployments that exist right now, is it safe to add this one?”
This problem has been right in front of our faces for decades, but I haven't ever seen it laid out like this, or fully realised it myself. Which is surprising, given the obvious utility of types for program-level correctness.
I think the next big leap forward in software reliability will be the development/integration of both the tooling to accomplish this, and the cultural norms to use them. Automating this (as far as possible -- they do note that semantic drift is immune to automatic detection) will be as big a deal as the concept of version control for code, or CI/CD.
There's lots here that I need to dig into further.
My problem with functional programming is that hardware is imperative, which means that functional approach isn't "just a different way of thinking" but an extra layer of complexity that needs to explain its existence.
Normally the database and message queue is decoupled from the backend service. This decoupling makes managing simpler and make cross-language things easier.
But if the type system need to cover all these components, then they start coupling again.
Coupling is not necessarily a bad thing as long as it gives good developer UX. If the database is tightly coupled with programming language, then it looks like ORM but better. And it probably can also reduce CRUD biolerplate or N+1 issue etc.
Related, there is SpacetimeDB that make backend run within database, and the backend code is highly coupled with SpacetimeDB's own API
Are there any other libaries / research / etc that sorta take a more functional approach to solving these sort of problems?
For db stuff - what if we flipped the way we write schemas around. A schema is its something that you derive at a current state, rather than starting by writing a schema and then generating migrations? So you can reason over all of it rather than a particular snapshot?
Thats actually how its done! You accept all schemas in your db and default value or afterfit around old interfaces to keep compatibility.
You never really write directly ofc so maybe the author is confused as to how FP handles this. For a category this is just another morphism be it a simple schema or a more complicated function, its just many in/out plugged together.
So when you call writeXY your caller has absolutely no need to know what actually happens. Catching and modifying old versions is just another morphism. You can even keep the layout and just accept version and payload as input.
The value of a schema in a db like Postgres isn’t in the description of the storage format or indexing, but the invariants the database can enforce on the data coming in. That includes everything from uniqueness and foreign key constraints to complex functions that can pull in dozens of tables to decide whether a row is valid or should be rejected. How do you derive declarative rules from a turing complete language built into the database?
The issue is our current databases were not designed for proper schema resolution.
The correct answer here is that a database ought to be modeled as a data type. Sql treats data type changes separate from the value transform. To say this is retarded is an understatement.
The actual answer is that the schema update is a typed function from schema1 to schema2. The type / schema of the db is carried in the types of the function. But the actual data is moved via the function computation.
Keeping multiple databases around is honestly a potential good use of homotopy type theory extended with support for partial / one-way equivalences
This is an absolutely wonderful article. I helped build a system very similar to the one quoted below:
> A deploy pipeline that queries your orchestrator for running image tags, checks your migration history against the schema registry, diffs your GraphQL schema against collected client operations, and runs Buf’s compatibility checks: this is buildable today, with off-the-shelf parts.
that was largely successful at eliminating interservice compatibility errors, but it felt like we were scrambling around in a dark and dusty corner of software engineering that not many people really cared about. Good to see that there's a body of academic work that I was not entirely aware about that is looking into this.
The article successfully identifies that, yes, there are ways, with sufficient effort, to statically verify that a schema change is (mostly) safe or unsafe before deployment. But even with that determination, a lot of IDLs make it still very hard to evolve types! The compatibility checker will successfully identify that strengthening a type from `optional` to `required` isn't backwards compatible: _now what_. There isn't a safe pathway for schema evolution, and you need to reach for something like Cambria (https://www.inkandswitch.com/cambria/, cited in the article) to handle the evolution.
I've only seen one IDL try to do this properly by baking evolution semantics directly into its type model, typical (https://github.com/stepchowfun/typical) with it's `asymmetric` type label: it makes a field required in the constructor but not the deserialiser. I would like to see these "asymmetric" types find their ways into IDLs like Protocol Buffers such that their compatibility checkers, like Buf's `buf breaking` (https://buf.build/docs/reference/cli/buf/breaking/), can formally reason about them. I feel like there is a rich design space of asymmetric types that are poorly described (enum values that can only be compared against but never constructed, paired validations, fallback values sent over the wire, etc.)
Another one that I think makes a pretty good attempt is the config language CUE and its type system (https://cuelang.org/docs/concept/the-logic-of-cue/), which allows you to do first-class version compatibility modelling. But I have yet to investigate whether CUE is fit for purpose for modelling compatibility over the wire / between service deploys.
With all due respect to Duncan... The reason why this is an issue at all is because the people who architected the internet were not thinking functionally. If they were, they would have come up with something like Nix.
But yes, interfacing between formally proven code and the wild wild West is always an issue. This is similar to when a person from a country with a strong legal system suddenly find themselves subject to the rule of the jungle.
The better question to ask is why did we choose to build a chaotic jungle and how do we get out of it
The version problem is solved well in FP by catching the old function and translating it cleanly. This ofc also works with data representation.
Im not sure what the point of the post is. Minimizing side effects in your code (keeping functions pure) is what gives you the flexibility he says FP misses?!
Terrible article. Nowhere in its litany of what one has to do to build systems that work is anything said about functional programming is incompatible with any of that, or even how the mindset of the functional programmer is incompatible with any of that.
The formula for this article was: truism about X followed by unrelated stuff about Y that the reader is supposed to come away thinking X is incompatible with.
I'm very puzzled at the positive comments it's getting. It's insanely, discourteously long for the number of distinct ideas in it. There's banal LLM-ish smirking quips, very little personality and a lot of repetition.
There's certainly no personal anecdotes of difficult problems which would go a long way to show the author actually knows their stuff.
There’s so much in this article where I look at it and I’m like, is this ai slop?
I’m beginning to really appreciate short articles with a few bullet point takeaways.
With respect to version control across systems, when you get into serious stuff where mistakes are measured in lives and/or three commas, there’s just a lot more simplicity in the design of those systems than most people think.
Really big systems often have very simple design principles at their core which are echoed through out the topology.
In secure code like the stuff signal uses, having a hash of the code that is attested by a network of servers and chained back to self-signed identities on the client is the only way to go.
Here’s the hash of the code that’s supposed to be running on my server, here’s the proof that I verified it with all of the hardware and software tools at my disposal from the server, and here’s that hash and attestation embedded into the app on your phone or laptop that’s connecting to my box.
If there’s an easier way to get some semblance of “my device is talking to the right code on the right box,” please enlighten me.
anonymous908213|19 days ago
iand675|19 days ago
altmanaltman|19 days ago
2. "What a spurt of productivity!" = a petty personal snide that's not in good faith at all.
3. You provide nothing in terms of what this article gets wrong or the technicalities of it. If you want to discuss flaws with the article, be specific instead of "this screams AI! clickbait rags!"
As someone who works with a LOT of AI-generated content, I'm pretty sure this is not AI-written.
ncgl|19 days ago
So either his prompt is making the writing more palatable, or maybe he's just prolific.
In either case, I would count this as a w
stickfigure|19 days ago
10k words for "migrating code and data is a headache". Yeah. Next.
rixed|19 days ago
Could be shorter though, and the title might not be 100% spot on and a bit clickbaity indeed.
Also, it does not shout "AI generated" to me. The style is pleasant enough (but for the repetitions).
I honestly can't relate this comment with that article. Have we read the same thing?
akoboldfrying|19 days ago
Ronsenshi|19 days ago
Makes me think - what if all of this was written by an LLM agent?
TheTaytay|19 days ago
mb7733|19 days ago
stronglikedan|19 days ago
> the author has written five engineering articles in the last week
considering you can only see when the articles are published, not written, I'd say you're trying too hard...
ccppurcell|19 days ago
coolgoose|19 days ago
Even if it is, or isn't ai, the content is in fact correct.
There is no 'one' state of your application, unless you literally do maintenance window deploys and 0 queues and keep everything sync.
kortex|19 days ago
anon291|19 days ago
thmpp|19 days ago
But the solutions and tools functional programming provides help to have higher verifiability, fewer side effects and more compile-time checks of code in the deployment unit. That other challenges exists is fully correct. But without a functional approach, you have these challenges plus all others.
So while FP is not a one-size-fits-all solution to distributed systems challenges, it does help solve a subset of the challenges on a single system level.
zozbot234|19 days ago
These challenges can be solved by the usual tools of FP, but this requires each version of the system to be explicitly aware of the data schema varieties used by all earlier versions that are still in use. Then it's a matter of interpreting earlier-versioned data correctly, and translating data to earlier schema varieties whenever they may have to be interpreted by earlier versions of the code.
(It may also be helpful to introduce minimally revised releases of the earlier codes, that simply add some amount of forward compatibility re: dealing with later-released schemas, while broadly keeping all other behavior the same to avoid unwanted breakage. These approaches are not too hard to implement.)
voidhorse|19 days ago
lmm|19 days ago
And doing the things the article suggests would be mostly useless, because the problem is always the boundary between your neat little world and the outside. You can't solve the semantic drift problem by being more careful about which versions of the code you allow to coexist, because the vast majority of semantic drift problems happen when end users (or third parties you integrate with) change their mind about what one of the fields in their/your data model means. There are actually bitemporal database systems that will version the code as well as the data, that allow you to compute what you thought the value was last Thursday with the code you were running last Thursday ("bank python"). It solves some problems, but not as many as you might think.
mh2266|19 days ago
even setting aside the memory safety stuff, having sum types and compiler-enforced error handling is also not nothing...
egorelik|19 days ago
I still want to believe that future programming languages will be capable of tackling these issues, but exploring the design space will require being cognizant of those issues in the first place.
stingraycharles|19 days ago
I don’t think the criticism of the author applies LISP-style functional programming, which is much more accommodating to embracing chaos.
valenterry|19 days ago
Looks like the term "functional programming" has been watered down so much that now it is as useful as OOP: not at all.
Look, what matters is pure functional programming. It's about referential transparency, which means managing effects and reason about code in a similar way you can do with math. Static typing is very nice but orthogonal, ADT and making illegal states unrepresentable are good things, but all orthogonal.
Aperocky|19 days ago
Fundamentally - all your system after a certain scale becomes distributed systems, accuracy on a local level is usually not where systemic risks are coming from. A distributed system of fully accurate and correct local programs can implode or explode. Migration, compatibility and system stability are surprisingly dependent on engineer experience till this day.
agentultra|19 days ago
This isn’t unique to Haskell or functional programming at all. I’ve seen programmers from JavaScript to C++ suffer the same fate: tied into the boundaries of a single process they try to verify the correctness of their systems informally. By writing documents and holding long meetings or convincing themselves by prototyping. It’s only a matter of time before the find the error in production.
Hopefully it’s of no real consequence. The error gets fixed and it’s a week or two of extra work. But some times it triggers an outage that causes SLO’s to overshoot and SLAs to trigger which cost the company money. That outage could have potentially been avoided.
But it’s tough getting developers to see the benefit in model checking or any kind of formal methods. The hubris is often too great. It’s not just functional programmers! Although it surprised me when I was working with Haskell developers that so many were resistant to the idea.
(If one has experience and understands functional programming, learning TLA should be rather straight forward)
agentultra|19 days ago
In practice it’s a bit tricky to use because it doesn’t enforce encoding the schema identifier in the serialized payload. Oops.
But between that, automated typing from examples, and refinement it might be possible to develop a system that can manage to validate the serialized data against the parser and have the serialization data appear in the type. At that point you could query the whole system.
[0] https://hackage.haskell.org/packages/tag/library
wavemode|19 days ago
Even if you have a web application with only one server, and an embedded database, and no other components, you still have a distributed system. The browser is part of that system.
troad|19 days ago
But none of these things are functional programming? This is more the tradition of 'expressive static types' than it is of FP.
What about Lisp, Racket, Scheme, Clojure, Erlang / Elixir...
esafak|19 days ago
> The immutability of the log is the entire value proposition.
unknown|19 days ago
[deleted]
matt_kantor|19 days ago
> If you have a web application with more than one server, you have a distributed system. If you have background job workers, you have a distributed system. If you have a cron job, you have a distributed system. If you talk to Stripe, or send emails through SendGrid, or enqueue something in Redis, or write to a Postgres replica, then you are (I regret to inform you) operating a distributed system.
As soon as you have at least one client talking to your server you have a distributed system.
tombert|19 days ago
It makes me a little sad that Erlang/Elixir is kind of the only platform that does this though...
roblh|19 days ago
threethirtytwo|19 days ago
It's not. These are called green threads and under the hood it's not really threads. It's literally the same thing as async await in python and nodejs.
It's just different perspectives on the same concept of concurrency. Go and Erlang implicitly have calls to concurrency, while nodejs it's implicit.
threethirtytwo|19 days ago
voidhorse|19 days ago
zug_zug|19 days ago
Was the central thesis supposed to be that it's impossible for any tools to ever represent complex migrating systems with well-defined data? Because that's not true. Or that it's not practical or cost-effective to do so? Or was there a central point?
mpalmer|19 days ago
threethirtytwo|19 days ago
First put all your services in a monorepo have it all build as one under CI. That’s a static check across the entire system. When you do a polyrepo you are opening up a mathematical hole where two repos can be mismatched.
Second don’t use databases. Databases don’t type check and aren’t compatible with your functional code written on servers.
The first one is stupid… almost everyone just makes this mistake and most of the time unless you were there in the beginning to enforce monorepo you have to live with someone deploying a breaking change that effects a repo somehwere else. I’ve been in places where they simply didn’t understand the hole with polyrepos and they went with it despite me diagramming the issue on a white board (it’s that pervasive). Polyrepos exist for people who like organizing things with names and don’t understand static safety.
And it gets worse. So many people don’t understand that this problem only gets worse over time. The more repos you have in your polyrepo the more brittle everything becomes and they don’t even understand why.
The second one is also stupid but you had to be there in the very very beginning to have stopped it. The inception of the concept of databases should have been created such that any query you run on it needs to be compiled to run on the database and is thus statically checked. This is something that can easily be done but wasn’t done and now the entire World Wide Web just has to make do with testing. Hello? Type checking in application code but not in query code? Why?
That’s like half the problems with distributed systems completely solved. And these problems just exist because of raw stupidity. Stuff like race conditions and dead locks are solvable too but much harder. But these issues are obvious. What’s not obvious are the aforementioned problems that almost nobody talks about even though they are completely solvable.
blandflakes|19 days ago
There are definitely benefits to this approach. My coworkers do fall into the trap of assuming that all the services will be deployed simultaneously, which is not something you can really guarantee (especially if you have to roll one of them back at some point), so the monorepo approach gives them confidence in some breaking changes that it shouldn't (like adding a new required field).
akoboldfrying|19 days ago
That helps but is insufficient, since the set of concurrently deployed artifact versions can be different than any set of artifact versions seen by CI -- most obviously when two versions of the same artifact are actively deployed at the same time. It also appears to rule out the possibility of ever integrating with other systems (e.g., now you need to build your own Stripe-equivalent in a subdir of your monorepo).
> Second don’t use databases.
So you want to reimplement your own PostgreSQL-equivalent in another monorepo subdir too? I don't understand how opting not to use modern RDBMSes is practical. IIUC you're proposing implementing a DB using compiled queries that use the same types as the consuming application -- I can see the type-safety benefits, but this immediately restricts consumers to the same language or environment (e.g., JVM) that the DB was implemented in.
kbenson|19 days ago
That isn't very useful by itself. What's your suggested alternative that aligns with your advice of "don't"? How does it deal with destructive changes to data (e.g. a table drop)?
theLiminator|19 days ago
This is a good point, I've never really thought about this carefully before, but that makes so much sense.
w10-1|19 days ago
Ideas that helped me situate many of the issues we struggled with at various companies:
- distinguishing programs (the code as designed) from (distributed) systems (the whole mess, including maintenance and migration, as deployed)
- versioning code (reversibly) and data (irreversibly)
- walk-through of software solutions for distributed systems, and their blind spots wrt data versioning; temporal databases
- semantic drift: schema-less change
- Limitations of type systems for managing invariants
- co-evolving code and data; running old code on old data
The real benefit of systems thinking is recognizing whether an issue is unsolvable and/or manageable (with the current approach, or warrants a technology upgrade or directional shift). It's like O(n) evaluation of algorithms, not for space/time but for issue tractability.
oftenwrong|19 days ago
voidhorse|19 days ago
There are a lot of reasons people don't succeed in systems thinking or in writing good distributed architectures, but functional programming is not one of them. If anything it's an inability to think functionally and to leak state and establish brittle dependencies absolutely everywhere that scuppers systems. Yes, FP has ton's of techniques that aid in good design at lower levels too, that doesn't mean it hampers you at high levels. In fact, much of what this article puts forward is a false dichotomy. None of the lower level FP principles and techniques like strong and expressive types systems are in conflict with the much more general principles of system design covered in the article (and they actually help you realize these principles—think version skew is bad in Haskell? good luck figuring out wtf is even happening in a tangled javascript async promises monstrosity undergoing version skew)
Also, slop, gross. At least edit some of that empty AI verbiage out of the final essay. It has the character and charm of a wet rag and adds zero information.
cfiggers|19 days ago
The core thesis sticks with me: existing tooling that helps programmers think about and enforce correctness (linters, compilers, type systems, test suites) operate primarily (and often exclusively) on one snapshot of a project, as it exists at a particular point in time. This makes those tools, as powerful as they can be, unable to help us think about and enforce correctness across horizons that are not visible from the standpoint of a single project at a single point in time (systems distributed across time, across network, across version history).
I feel like the issue of semantic drift is a bigger topic that probably has roots beyond systems architecture/computer engineering.
Also I found the writing style very clear. It helpfully repeats the core ideas, but (IMHO, this is subjective) doesn't do so ad nauseum.
I'm interested in reading other writing from the same author.
epgui|19 days ago
lock1|19 days ago
I don't think there's something preventing you from constructing guardrails with type system & tests enforcing correctness that handles versioning.
I'm not buying the "unable to help us to think about" part. I argue that Rust's `Option`/`Result`/sum-type + exhaustive match/switch very valuable for proper error handling. You could define your own error type and exhaustively handle each case gracefully, with the bonus that now your compiler is also able to check it.
epgui|19 days ago
akoboldfrying|19 days ago
> “given the actual set of deployments that exist right now, is it safe to add this one?”
This problem has been right in front of our faces for decades, but I haven't ever seen it laid out like this, or fully realised it myself. Which is surprising, given the obvious utility of types for program-level correctness.
I think the next big leap forward in software reliability will be the development/integration of both the tooling to accomplish this, and the cultural norms to use them. Automating this (as far as possible -- they do note that semantic drift is immune to automatic detection) will be as big a deal as the concept of version control for code, or CI/CD.
There's lots here that I need to dig into further.
anal_reactor|19 days ago
qouteall|19 days ago
But if the type system need to cover all these components, then they start coupling again.
Coupling is not necessarily a bad thing as long as it gives good developer UX. If the database is tightly coupled with programming language, then it looks like ORM but better. And it probably can also reduce CRUD biolerplate or N+1 issue etc.
Related, there is SpacetimeDB that make backend run within database, and the backend code is highly coupled with SpacetimeDB's own API
foxes|19 days ago
For db stuff - what if we flipped the way we write schemas around. A schema is its something that you derive at a current state, rather than starting by writing a schema and then generating migrations? So you can reason over all of it rather than a particular snapshot?
AlotOfReading|19 days ago
[0] https://docs.datomic.com/datomic-overview.html#information-m...
Krei-se|19 days ago
So when you call writeXY your caller has absolutely no need to know what actually happens. Catching and modifying old versions is just another morphism. You can even keep the layout and just accept version and payload as input.
throwup238|19 days ago
The value of a schema in a db like Postgres isn’t in the description of the storage format or indexing, but the invariants the database can enforce on the data coming in. That includes everything from uniqueness and foreign key constraints to complex functions that can pull in dozens of tables to decide whether a row is valid or should be rejected. How do you derive declarative rules from a turing complete language built into the database?
anon291|19 days ago
The correct answer here is that a database ought to be modeled as a data type. Sql treats data type changes separate from the value transform. To say this is retarded is an understatement.
The actual answer is that the schema update is a typed function from schema1 to schema2. The type / schema of the db is carried in the types of the function. But the actual data is moved via the function computation.
Keeping multiple databases around is honestly a potential good use of homotopy type theory extended with support for partial / one-way equivalences
cadamsdotcom|13 days ago
Thank you to the author.
mbo|19 days ago
> A deploy pipeline that queries your orchestrator for running image tags, checks your migration history against the schema registry, diffs your GraphQL schema against collected client operations, and runs Buf’s compatibility checks: this is buildable today, with off-the-shelf parts.
that was largely successful at eliminating interservice compatibility errors, but it felt like we were scrambling around in a dark and dusty corner of software engineering that not many people really cared about. Good to see that there's a body of academic work that I was not entirely aware about that is looking into this.
The article successfully identifies that, yes, there are ways, with sufficient effort, to statically verify that a schema change is (mostly) safe or unsafe before deployment. But even with that determination, a lot of IDLs make it still very hard to evolve types! The compatibility checker will successfully identify that strengthening a type from `optional` to `required` isn't backwards compatible: _now what_. There isn't a safe pathway for schema evolution, and you need to reach for something like Cambria (https://www.inkandswitch.com/cambria/, cited in the article) to handle the evolution.
I've only seen one IDL try to do this properly by baking evolution semantics directly into its type model, typical (https://github.com/stepchowfun/typical) with it's `asymmetric` type label: it makes a field required in the constructor but not the deserialiser. I would like to see these "asymmetric" types find their ways into IDLs like Protocol Buffers such that their compatibility checkers, like Buf's `buf breaking` (https://buf.build/docs/reference/cli/buf/breaking/), can formally reason about them. I feel like there is a rich design space of asymmetric types that are poorly described (enum values that can only be compared against but never constructed, paired validations, fallback values sent over the wire, etc.)
Another one that I think makes a pretty good attempt is the config language CUE and its type system (https://cuelang.org/docs/concept/the-logic-of-cue/), which allows you to do first-class version compatibility modelling. But I have yet to investigate whether CUE is fit for purpose for modelling compatibility over the wire / between service deploys.
anon291|19 days ago
But yes, interfacing between formally proven code and the wild wild West is always an issue. This is similar to when a person from a country with a strong legal system suddenly find themselves subject to the rule of the jungle.
The better question to ask is why did we choose to build a chaotic jungle and how do we get out of it
Krei-se|19 days ago
Im not sure what the point of the post is. Minimizing side effects in your code (keeping functions pure) is what gives you the flexibility he says FP misses?!
platinumrad|19 days ago
cryptonector|19 days ago
The formula for this article was: truism about X followed by unrelated stuff about Y that the reader is supposed to come away thinking X is incompatible with.
mpalmer|19 days ago
There's certainly no personal anecdotes of difficult problems which would go a long way to show the author actually knows their stuff.
josh2600|19 days ago
I’m beginning to really appreciate short articles with a few bullet point takeaways.
With respect to version control across systems, when you get into serious stuff where mistakes are measured in lives and/or three commas, there’s just a lot more simplicity in the design of those systems than most people think.
Really big systems often have very simple design principles at their core which are echoed through out the topology.
In secure code like the stuff signal uses, having a hash of the code that is attested by a network of servers and chained back to self-signed identities on the client is the only way to go.
Here’s the hash of the code that’s supposed to be running on my server, here’s the proof that I verified it with all of the hardware and software tools at my disposal from the server, and here’s that hash and attestation embedded into the app on your phone or laptop that’s connecting to my box.
If there’s an easier way to get some semblance of “my device is talking to the right code on the right box,” please enlighten me.
sakesun|19 days ago
rsrsrs86|18 days ago
travisgriggs|19 days ago
[deleted]
diebillionaires|19 days ago
tapete1|18 days ago
the__alchemist|19 days ago
srik|19 days ago
nineteen999|19 days ago
An FP programmer admitting they live in ivory towers. I never thought I'd see the day.
I love it. HN needs 10 times more of this.
This person has grown up - the rest of you? Get a grip.