The Law of Leaky Abstractions (2002)

I've long been having a hunch that we're currently in the "wild west of abstraction".

I think we're missing an essential constraint on the way we do abstraction.

My hunch is that this constraint should be that abstractions must be reversible.

Here's an example: When you use a compiler, you can work at a higher layer of abstraction (the higher-level language). But, this means you're now locked into that layer of abstraction. By that I mean, you can no longer work at the lower layer (assembly), even if you wanted to. You could in theory of course modify the compiler output after it's been generated, but then you'd have to somehow manually keep that work in sync whenever you want to re-generate. Using an abstraction kinda locks you into that layer.

I see this problem appearing everywhere:

- Use framework <--> Write from scratch

- Use an ORM <--> Write raw SQL

- Garbage collection <--> Manual memory management

- Using a DSL <--> Writing raw language code

- Cross platform UI framework <--> Native UI code

- ...

I think we're missing a fundamental primitive of abstraction that allows us to work on each layer of abstraction without being locked in.

If you have any thoughts at all on this, please share them here!

jerf|2 years ago

Abstractions work by restricting the domain of what you can do, then building on those restrictions. For example, raw hardware can jump anywhere, but structured programming constrains you to jump only to certain locations in order to implement if, for, functions, etc. It is precisely those restrictions that bring the benefits of structured programming; if you still frequently dipped into jumping around directly structured programming would fail to provide the guarantees it is supposed to provide. CRUD frameworks provide their power by restricting you to CRUD operations, then building on that. Immutable data is accomplished by forbidding you from updating values even though the hardware will happily do it. And so on.

Escape hatches under the abstractions are generally there precisely to break the abstractions, and break them they do.

Abstractions necessarily involve being irreversible, or, to forestall a tedious discussion of the definition of "irreversible", necessarily involve making it an uphill journey to violate and go under the abstraction. There's no way around it. Careful thought can make using an escape hatch less pain than it might otherwise be (such as the ORM that makes it virtually impossible to use SQL by successfully hiding everything about the SQL tables from you so you're basically typing table and column names by dead reckoning), but that's all that can be done.

One thing to do about this is that just as in the past few years the programming community has started to grapple with the fact that libraries aren't free but come with a certain cost that really adds up once you're pulling in a few thousand libraries for a framework's "hello world", abstractions that look really useful but whose restrictions don't match your needs need to be looked at a lot more closely.

I had something like that happen to me just this week. I needed a simple byte ring buffer. I looked in my language's repos for an existing one. I found them. But they were all super complicated, offering tons of features I didn't need, like being a writethrough buffer (which involved taking restrictions I didn't want), or where the simple task of trying to understand the API was quite literally on par with implementing one myself. So I just wrote the simple thing. (Aiding this decision is that broadly speaking if this buffer does fail or have a bug it's not terribly consequential, in my situation it's only for logging output and only effectively at a very high DEBUG level.) It wasn't worth the restrictions to build up stuff I didn't even want.

samatman|2 years ago

> Here's an example: When you use a compiler, you can work at a higher layer of abstraction (the higher-level language). But, this means you're now locked into that layer of abstraction. By that I mean, you can no longer work at the lower layer (assembly), even if you wanted to.

Native-code compilers commonly allow emitting assembly directly, but now your source code isn't portable between CPUs. Many interpreted languages, even most, allow FFI code to be imported, modifying the runtime accordingly, but now your program isn't portable between implementations of that language, and you have to be careful to make sure the behavior you've introduced doesn't mess with other parts of the system in unexpected ways.

Generalizing, it's often possible to drill down beneath the abstraction layer, but there's often an inherent price to be paid, whether it be taking pains to preserve the invariants of the abstraction, losing some of the benefits of it, or both.

There are better and worse versions of this layer, I would point to Lua as a language which is explicitly designed to cross the C/Lua boundary in both directions, and which did a good job of it. But nothing can change the fact that pure-Lua code simply won't segfault, but bring in userdata and it very easily can; the problems posed are inherent.

ihumanable|2 years ago

Lots of abstractions have an escape hatch down to the lower level, you can put assembly in your C code, most ORMs have some way to just run a query, etc.

I think the question I have is, what benefit does this provide? Let's say we could wave a magic wand and you can operate at any layer of abstraction. Is this beneficial in some way? The article is about leaky abstractions and states

> One reason the law of leaky abstractions is problematic is that it means that abstractions do not really simplify our lives as much as they were meant to.

I think I'm just struggling to understand how this would help with that.

yen223|2 years ago

There's a well-written article by Bret Victor on climbing the ladder of abstraction. It makes the same argument you made, in that climbing "down" the ladder is just as important as going "up"

https://worrydream.com/LadderOfAbstraction/

Veserv|2 years ago

No, reversible abstractions are just one kind of abstraction. For instance, a machine code sequence to a linear sequence of assembly instructions is a reversible abstraction. Not every machine code sequence is expressible as a linear sequence of assembly instructions, but every linear sequence of assembly instructions has a trivial correspondence to a machine code sequence.

However, consider the jump to a C-like language. The key abstraction provided there is the abstraction of infinite local variables. The compiler manages this through a stack, register allocation, and stack spilling to provide the abstraction and consumes your ability to control the registers directly to provide this abstraction. To interface at both levels simultaneously requires the leakage of the implementation details of the abstraction and careful interaction.

What you can do easily is what I call a separable abstraction, a abstraction that can be restricted to just the places it is needed/removed where unneeded. In certain cases in C code you need to do some specific assembly instruction, sequence, or even function. This can be easily done by writing a assembly function that interfaces with the C code via the C ABI. What is happening there is that the C code defines a interface allowing you to drop down or even exit the abstraction hierarchy for the duration of that function. The ease of doing so makes C highly separable and is part of the reason why it is so easy to call out to C, but you hardly ever see anybody calling out to say Java or Haskell.

Of course, that is just one of the many properties of abstractions that can make them easier to use, simpler, and more robust.

saghm|2 years ago

> My hunch is that this constraint should be that abstractions must be reversible.

> Here's an example: When you use a compiler, you can work at a higher layer of abstraction (the higher-level language). But, this means you're now locked into that layer of abstraction. By that I mean, you can no longer work at the lower layer (assembly), even if you wanted to. You could in theory of course modify the compiler output after it's been generated, but then you'd have to somehow manually keep that work in sync whenever you want to re-generate. Using an abstraction kinda locks you into that layer.

Just to make sure I understand, you're proposing a constraint that would rule out every compiler in existence today? I feel like overall I think compilers have worked out well, but if I'm not misunderstanding and this is how you actually feel, I guess I at least should comment your audacity, because I don't think I'd be willing to seriously propose something that radical.

lmm|2 years ago

The best paradigm for understanding abstractions is not the theory-and-model style (which requires hiding details irreversibly), but the equivalence style.

A good abstraction is e.g. summing a list whose elements are a monoid - summing the list is equivalent to adding up all the elements in a loop. Crucially, this doesn't require you to "forget" the specific type of element that your list has - a bad version of this library would say that your list elements have to be subtypes of some "number" type and the sum of your list came back as a "number", permanently destroying the details of the specific type that it actually is. But with the monoid model your "sum" is still whatever complex type you wanted it to be - you've just summed it up in the way appropriate to that type.

drewm1980|2 years ago

You can probably find an IDE plugin that inlines the assembly for a c function. Most ids can show the assembly side by side with your c code so it wouldn't be that much of a step. To fulfill your vision you would also need a decompiler (and an inliner) to convert a block of assembly back into C if a corresponding C routine exists.

titzer|2 years ago

I work on programming languages and systems (virtual machines). A key thing with a systems programming language is that you need to be able to do things at the machine level. Here's a talk I gave a year ago about it: https://www.youtube.com/watch?v=jNcEBXqt9pU

eschneider|2 years ago

That's not really true of, at least C compilers. Because compilers have ABI's and fixed calling conventions, it's straightforward, documented, and not uncommon (depending on your application area/deployment target) to drop down to the ASM layer if you need to do that.

It's definitely one of those things that makes C nice for bare metal programming.

wvenable|2 years ago

Most ORMs give a way to integrate nicely with SQL if you need to reach down to that layer and still use the rest of the ORM features.

There is no silver bullet; everything is a trade off. Almost all of the time, the trade off is entirely worth it even if that gets you locked into that solution.

samsquire|2 years ago

This is something I think a lot about.

I spend a lot of time trying to think of something that composes. Monads are one answer.

I think we need advanced term rewriting systems that also optimize and equivalise.

I really enjoy Joel on Software blog posts from this era.

dhdjksosja|2 years ago

Babel towers of macro edsls, aka learn lisp. https://github.com/combinatorylogic/mbase

__s|2 years ago

I think this is a good way to frame abstraction vs macro

mjw1007|2 years ago

I never liked the way he used TCP as an example here.

I don't think it's sensible to think of "make it reliable" as a process of abstraction or simplification (it's obviously not possible to build a reliable connection on top of IP if by "reliable" you mean "will never fail"). "You might have to cope with a TCP connection failing" doesn't seem to be the same sort of thing as his other examples of leaky abstractions.

TCP's abstraction is more like "I'll either give you a reliable connection or a clean error". And that one certainly does leak. He could have talked about how the checksum might fail to be sufficient, or how sometimes you have to care about packet boundaries, or how sometimes it might run incredibly slowly without actually failing.

joe_the_user|2 years ago

Indeed, his discussion seems to involve a confusing of a leaky network protocol and a leaky abstraction. Perhaps he wanted to meta-illustrate his concept by having his discussion itself be leaky.

lcuff|2 years ago

I like the idea of TCP as a leaky abstraction because it points out the difficulty of engineering the abstraction we really want. It would be wonderful for TCP to be a guaranteed connection abstraction, but it turns out in today's world, the abstraction of a reliable connection is TCP + a network administrator + a guy with wire snips + solder (metaphorically). Maybe down the road, AIs and repair bots will be involved, and the guaranteed connection abstraction might become real or much much stronger. Although it gets more complicated because if a message takes hours to deliver, is that going to work for your application? Yes if you're archiving documents, no if you're trying to set up a video conference call or display a web page.

TCP is problematic in modern circumstances (think: Inside a data center) because a response within milliseconds is what's expected to make the process viable. TCP was designed to accommodate some element of the path being a 300 Baud modem, where a response time in seconds is possible as the modem dials the next hop, so the TCP timeouts are unuseable. QUIC was developed to address this kind of problem. My point being, the abstraction of a guaranteed _timely_ connection is even harder.

I think Joel could have expanded his thoughts to include the degree of leak. SQL is a leaky abstraction itself, yes, but my own take is that ORMs are much leakier: Every ORM introduction document I've read explains the notation by saying "here's the sql that is produced". I think of ORMs as not a bucket with holes, but a bucket with half the bottom removed.

an1sotropy|2 years ago

I first learned about "leaky abstractions" from John Cook, who describes* IEEE 754 floats as a leaky abstraction of the reals. I think this is a good way of appreciating floating point for the large group of people who's experience is somewhere between numerical computing experts (who look at every arithmetic operation through the lens of numerical precision) and total beginners (who haven't yet recognized that there can't be a one-to-one correspondence between a point on the real number line and a "float").

* https://www.johndcook.com/blog/2009/04/06/numbers-are-a-leak...

davesque|2 years ago

I feel like this article should be called "The Law of Bad Abstractions." I often see this cited as a blanket rejection of complexity in software. But complexity is unavoidable and even necessary. A skillful engineer will therefore design their abstractions carefully and correctly, balancing time spent thinking forward against time spent implementing a solution. I think Joel understands this, but it feels weird how he frames it as a "law", as though it's something he's discovered instead of a simple fact that arises from the nature of what abstractions are: things that stand in for (or mediate interaction with) some other thing without actually being that thing. What a surprise that the stand-in ends up not being the actual thing it's standing in for!

refactor_master|2 years ago

A car is an implementation meant to deal with a problem (the weather), but never abstracts away physics or forces full buy-in to some alternate reality. You can’t just go around and say any imperfection in an implementation is a leaky abstraction. That’s not how it works.

My shoe is not abstracting away the terrain, nor is it leaky because it doesn’t handle all weather conditions. Well, it is leaky, but not in that sense.

samatman|2 years ago

An analogy is an abstraction, and abstractions leak.

BoiledCabbage|2 years ago

Young people should probably know that (as far as I recall) Joel more or less invented tech blogging as a form of advertising/recruiting for your company.

Namely either listing out the process/perks that a good engineering team should have and how conveniently his company has it. Or describing interesting and challenging problems they solved and how you can join them and solve problems like that too.

I don't recall anyone popular doing it before him and it's pretty much industry standard now. (Although, feel free to chime in if that's wrong. But popular being a key word here),

unknown|2 years ago

[deleted]

wvenable|2 years ago

I loved this essay when it came out but I've come to dislike how "leaky abstraction" has become a form of low effort criticism that gets applied to almost anything.

simonw|2 years ago

I love this essay so much. I read it 22 years ago and it's been stuck in my mind ever since: it taught me that any time you take on a new abstraction that you don't understand, you're effectively taking on mental debt that is likely to come due at some point in the future.

This has made me quite a bit more cautious about the abstractions I take on: I don't have to understand them fully when I start using them, but I do need to feel moderately confident that I could understand them in depth if I needed to.

And now I'm working with LLMs, the most opaque abstraction of them all!

Legend2440|2 years ago

>And now I'm working with LLMs, the most opaque abstraction of them all!

You put a black box around it to fit it into the world of abstractions that traditional programs live in.

But I'd say the most interesting thing about neural networks is that they do not have any abstractions within them. They're programs, but programs created by an optimization algorithm just turning knobs to minimize the loss.

This creates very different kinds of programs - large, data-driven programs that can integrate huge amounts of information into their construction. It's a whole new domain with very different properties than traditional software built out of stacked abstractions.

evanmoran|2 years ago

If you like this, my other favorite essay by Joel is Making Wrong Code Look Wrong:

https://www.joelonsoftware.com/2005/05/11/making-wrong-code-...

o11c|2 years ago

The problem is that it completely ignores the correct solution, which is "use types; we invented them for a reason".

HTML fragments should never be stored in strings.

lmm|2 years ago

This is wrong and even just looking at the examples is enough to understand that it's wrong. Writing your program to use UDP instead of TCP won't make it work any better when someone unplugs the network cable. An abstraction performing worse isn't a "leak" - the abstraction is still doing what it said it would (e.g. the SQL query still returns the right results), and in practice very few query planners are worth tuning manually (indeed PostgreSQL doesn't even offer you the ability to do hints etc., and that doesn't seem to hurt its popularity). I've never understood why this post was so popular - it only ever seems to be used as an excuse for those who want to write bad code to do so.

falserum|2 years ago

> Writing your program to use UDP instead of TCP won't make it …

There was no proposal to use UDP, so this comment is not about the article.

The point of the article is near the end:

> the only way to deal with the leaks competently is to learn about how the abstractions work and what they are abstracting. So the abstractions save us time working, but they don’t save us time learning.

I.e. To competently use an abstraction, one needs to understand what happens under the hood.

xtiansimon|2 years ago

Are "models" and "abstractions" the same thing?

A recent post, `Mental Models: 349 Models Explained...` reminded me of the `Debits and Credits Model`, which works because of the Debit and Credits Formula or Accounting Equation (Assets = Equity + Liabilities). Minor leaks happening here and are usually stuffed into an account--so we don't have to eat lunch at our desks.

The abstraction examples seem similar, but the discussion around leakage is interestingly different. For example @anonymous-panda suggests you sometimes want your abstraction to be leaky: _ "...leaky abstraction would be when you need to still distinguish the error type and TCP wouldn’t let you..."_

tgma|2 years ago

Building White-Box Abstractions by Program Refinement[1]

[1]: https://people.inf.ethz.ch/suz/publications/onward16.pdf

foobarian|2 years ago

> But sometimes the abstraction leaks and causes horrible performance and you have to break out the query plan analyzer and study what it did wrong, and figure out how to make your query run faster

Hah. The more things change, the more they stay the same.

highfrequency|2 years ago

> “All abstractions leak, and the only way to deal with the leaks competently is to learn about how the abstractions work and what they are abstracting. So the abstractions save us time working, but they don’t save us time learning.”

Very nicely worded. But I would also add that:

1. An abstraction can often be manned by one person, so when it leaks only one person needs to understand it deeply enough to fix it.

2. The article seems to miss the iterative nature of abstractions. Over time, the goal is to iterate on the abstraction so that it exposes more of the stuff that matters, and less of the stuff that doesn’t matter. Perhaps all abstractions leak, but some leak way less often and save much more thinking in the meantime than others. Rather than lamenting the nature of abstractions we should focus effort on making them as practically useful as possible.

glial|2 years ago

This is well-written. I might suggest that what makes pure mathematics special is that abstractions in pure math are not leaky, unlike in (nearly?) every other domain.

mturmon|2 years ago

Didn't downvote, don't entirely disagree, but maybe it would be OK to say that the leaks can be made more apparent:

"The integral reverses the derivative" † ‡ *

† Up to an arbitrary additive constant

‡ Provided the derivative exists

* And we hope you don't have concerns about the existence of the real numbers

titzer|2 years ago

Joel of course hits the nail on the head about the two major things that cause abstractions to fall apart: performance and bugs (or debugging). In programming languages we talk about abstractions all the time--PL of course is all about abstractions. A computational abstraction like a bytecode, source language, or even machine code, can be proven be a proper (or full) abstraction, meaning there is no way for implementation details to leak in--you cannot observe the electrons flowing by executing A+B, after all.

...until you start measuring sidechannels, or the CPU or compiler has a bug.

I think about this a lot when dealing with VMs; a complex VM cannot hide its complexity when programs care about execution time, or when the VM actually has a bug.

jacobmarble|2 years ago

> So the abstractions save us time working, but they don’t save us time learning.

I could say the same for programming with Copilot.

nextworddev|2 years ago

This is a long winded way to say that every abstraction has failure modes

falserum|2 years ago

Thats the first part.

The second part is that you must acknowledge that it is JUST an abstraction, and learn&understand what actually happens.

Bostonian|2 years ago

Should have (2002) in title.

avgcorrection|2 years ago

> Back to TCP. Earlier for the sake of simplicity I told a little fib, and some of you have steam coming out of your ears by now because this fib is driving you crazy. I said that TCP guarantees that your message will arrive. It doesn’t, actually. If your pet snake has chewed through the network cable leading to your computer, and no IP packets can get through, then TCP can’t do anything about it and your message doesn’t arrive.

The argument is disqualified at this point. The whole world is a leaky abstraction because <freak meteor hit could happen>. At this point your concept is all-encompassing and in turn useless.

There are assumptions: this computation will finish eventually [assuming that no one unplugs the computer itself]. This does not make things leaky.

There are leaky abstractions I guess but not all are. A garbage collector that can cause memory errors would be leaky. I don’t know anything about garbage colletors but in my experience they don’t.

Then someone says that a garbage collector is leaky because of performance concerns (throughput or latency). That’s not a leak: that’s part of the abstracting away part—some concerns are abstracted away. To abstract away means to make it something that you can’t fudge or change. To say that “this is implementation-defined”. An abstract list is an abstraction in the sense that it has some behavior. And also in the sense that it doesn’t say how those behaviors are implemented. That’s both a freedom and a lurking problem (sometimes). Big reallocation because of amortized push? Well you abstracted that away so can you complain about it? Maybe your next step is to move beyond the abstraction and into the more concrete.

What are abstractions without something to abstract away? They are impossible. You have to have the freedom to leave some things blank.

So what Spolsky is effectively saying is that abstractions are abstractions. That looks more like a rhetorical device than a new argument. (Taxes are theft?)

EDIT: Flagged for an opinion? Very well.

samatman|2 years ago

> There are leaky abstractions I guess but not all are. A garbage collector that can cause memory errors would be leaky. I don’t know anything about garbage collectors but in my experience they don’t.

Garbage collectors are a rich source of abstraction leaks, depending on what you do with the runtime. If you color within the lines, no surprises, the garbage collector will work. Unless it has a bug, and hundreds of GC bugs, if not thousands, have shipped over the decades; but while a bug is an abstraction leak, it's not a very interesting one.

But go ahead and use the FFI and things aren't so rosy. Usually the GC can cooperate with allocated memory from the other side of the FFI, but this requires care and attention to detail, or you get memory bugs, and just like that, you're manually managing memory in a garbage collected language, and you can segfault on a use-after-free just like a Real Programmer. It's also quite plausible to write a program in a GC language which leaks memory, by accidentally retaining a reference to something which you thought you'd deleted the last reference to. Whether or not you consider this an abstraction leak depends on how you think of the GC abstraction: if you take the high-level approach that "a GC means you don't have to manage memory" (this is frequently touted as the benefit of garbage collection), sooner or later a space leak is going to bite you.

Then there are finalizers. If there's one thing which really punctures a hole in the GC abstraction, it's finalizers.

mjw_byrne|2 years ago

I tend to agree. "All nontrivial abstractions are leaky" reminds me of other slightly-too-cute rules, such as "full rewrites are a mistake" and "never parse JSON manually".

I wouldn't call TCP leaky because it can't deliver data across a broken network cable, for example. It's abstracting away certain unreliable features of the network, like out of order delivery of packets. It's not abstracting away the fact that networking requires a network.

joeyjojo|2 years ago

I suppose it should be considered where the abstraction actually exists. If the abstraction exists in logic or mathematics (ie. a triangle is a 3 sided polygon) it probably doesn't make much sense to consider the ramifications that thought occurs in a physical brain that can fail. On the other hand if the abstraction is physical (ie, hardware), then the fact that it is bound by physical law is obviously implicit. Software encompasses both physical and logical abstractions, so you need to pick a lens or perspective in order to actually view its abstractions.

lpapez|2 years ago

I don't agree with your opinion, but I see how it can be seen as being perfectly reasonable and there really is no need to flag you (unflagged).

bxparks|2 years ago

I unflagged you by vouching for you. I found your post difficult to understand and couldn't figure out what you are trying to say, but I agree it was not deserving of a flag.

falserum|2 years ago

Seems like you missed the point of the article.

Yes, most of the article is dedicated to describe the “leak”, but there was no call to abolish abstractions. Just the insight that one needs to understand implementation of those.

unknown|2 years ago

[deleted]

williamcotton|2 years ago

Cannot everyone get the sense that how we currently build software is one gigantic leaky abstraction?

Legend2440|2 years ago

Worse; it's a stack of abstractions on top of abstractions on top of abstractions. You're at least 10 layers away from the hardware, possibly more.

Veserv|2 years ago

Every abstraction leaks. A good abstraction for your domain is stable in your domain and only leaks outside of your domain. A great abstraction is separable allowing you to only drop down the abstraction level where needed and allowing the rest of the code to continue using the abstraction where the leaks do not matter, and layered allowing you to only drop down as much as needed and making it easy to rebuild parts of the upper layers on a new foundation.

unknown|2 years ago

[deleted]

unknown|2 years ago

[deleted]

rancour|2 years ago

[deleted]

85 comments