top | item 33823712

No Ghosts

97 points| bkudria | 3 years ago |blog.sunfishcode.online | reply

33 comments

order
[+] gitgud|3 years ago|reply
Interesting take, the article describes ENV variables and their inherent untyped danger when you depend on them, but doesn't really describe an alternative solution for them.

A good solution I've just seen is to validate the environment variables using a schema like [1] zod. Which will guarantee that the ENV variables are the exact type that you expect when the program starts, or it will throw an error.

[1] Example -> https://github.com/t3-oss/create-t3-app/blob/bc57d02789209f1...

[+] sunfish|3 years ago|reply
Typing is nice, and when one is working within an existing system where environment variables are the main way for communicating data between parts of a system (which is many popular systems today), this kind of typing looks like it can add some nice benefits.

The blog post linked here is thinking about how the systems themselves could be designed differently, whether that's OS's, frameworks, platforms, languages, clusters, networks, or other things.

[+] josephg|3 years ago|reply
The normal alternatives to environment variables are:

- Command line arguments

- Or a configuration file. (Optionally at a path specified on a command-line argument)

[+] fefe23|3 years ago|reply
I have a hard time understanding what the article is trying to tell me.

As soon as you have components talk to each other over IPC, you pass strings of bytes. It does not matter whether you create a nice abstraction class around a string. Over the wire it is still a string.

What is he proposing we do to prevent confusion between strings?

Adding namespaces does not appear to be a useful idea because it boils down to "do input validation", which you are hopefully already doing.

[+] sunfish|3 years ago|reply
There are IPC mechanisms today which aren't just bytes. For example, the ability to send file descriptors over unix-domain sockets. Strings of bytes fundamentally can't do that. And in programs that pass file descriptors, it doesn't require any ghostly assumptions about what namespace the strings need to be resolved in.

To be sure, Unix-domain sockets aren't the answer to everything, but they are an example of a different way to think about communication.

[+] ivanbakel|3 years ago|reply
As a programmer in a high-level language, you do not need to have your ear to the wire. Just because whatever network-communicable abstraction you use ends up serialized as a sequence of bytes, doesn't mean you should be thinking about byte sequences when communicating in a network.

>What is he proposing we do to prevent confusion between strings?

Don't use a paradigm where your code can see the strings. If your high-level language only manipulates resources, it can't confuse strings with each other. It also can't manipulate strings to try to access different resources.

Sure, at the low level, you'll still be doing communication via bytestrings. But you can keep the footprint of that "core" low-level component small and trustworthy, and let the high level work confidently.

An analogy is using a memory-safe language like Java vs C. Java still relies on fast-but-scary code like pointer arithmetic for fast execution, but Java programmers don't need to know about pointers and aren't constantly on the lookout for buffer overflows or other bugs that plague C programs. The guarantees of the JVM let Java programmers do more expressive and saner things, like "take all the even numbers in a list greater than 100".

[+] skybrian|3 years ago|reply
This article seems to be recommending the use of capabilities, but a question is how you represent a capability, if not as a string or a number like a file descriptor. And how do you send it over a network, if not as a byte sequence?
[+] gpderetta|3 years ago|reply
For capabilities to work they need to be unforgeable. So you need some opaque handle (and a memory/type safe language) and on the wire you need some sort of cryptographic signing.
[+] unsafecast|3 years ago|reply
The idea is that you don't look. A handle is opaque. You don't care or depend on what's in it. Doesn't matter if it's actually a number or a string.

As for sending over a network, that's the low level details. You can keep high-level types most of the way.

[+] tonto|3 years ago|reply
this is a an important article. It's a bit hard to grok and requires some experience to understand but after you end up with a messy system doing too much ghosty stuff you yearn for stability
[+] zwkrt|3 years ago|reply
Agreed. This is one of the few articles that I’ve read this month that is really caused me to reflect on my own systems.

The insight that

  open(“foo.txt”)
is, from the perspective of component analysis, actually secretly

  open(filesystem, access_level, “foo.txt”)
is worth its weight in gold. I feel like I understand more about system design, capability-based permissions, and my own code.

Realistically it won’t immediately change the way I program and I won’t be running to the terminal to refactor my current codebase, but it’s like I’ve been clued in to a whole new class of code smells.

[+] jeffparsons|3 years ago|reply
The parts about accidentally leaking secrets reminded me of an idea I had a while back that builds on the idea of capabilities / unforgeable handles, and making a distinction between different kinds of trust.

An outer component, in this example a web service, might be provided with a handle to a secret that it needs for connecting to some other system — let's take Consul, for example. The web service is trusted to basically _intend_ to do the right thing, but it is not trusted to be vigilant enough to avoid leaking the secret, so it is not allowed to ever actually resolve that secret itself.

What it _can_ do is provide the secret-handle to another component whose job is to establish the connection to Consul for it. That second component has to be given a handle to a secret (it can't just look them up by itself) but once it has one, it can resolve it to the actual secret string. This second component does as little as possible, and is trusted to not accidentally leak the actual secret string — not even to the outer component that is using it.

The Wasm component model makes this sort of scheme really easy to implement because capabilities / unforgeable handles are a first-class concept and they are available for all components to create and communicate between each other.

I guess this might already be a well-established pattern elsewhere, but I don't remember seeing it anywhere.

[+] gpderetta|3 years ago|reply
Doesn't this just moves the problem to leaking the handle instead?
[+] AtlasBarfed|3 years ago|reply
So they want either:

- "fully qualified names" ... to some arbitrary obvious-after-the-fact degree, because fully FULL qualification becomes one of the heavyweight barriers he doesn't like: a central name validator/registry, reduced ability to reuse data because of the funny wrapping/name

- universal data typing, but that doesn't exist, and would be a barrier if it was

- universal data formats: oh god, that means standards bodies, doesn't it.

I totally agree about the make services interact --> they interact, but there's security holes --> impose security loop.

[+] Animats|3 years ago|reply
"By “ghost” here, I mean any situation where resources are referenced by plain data."

OK, whatever.

What he's railing against is canned strings which identify things. URLs, names in key/value stores, etc.

Attempts to get rid of that include the Windows Registry. That may not be a good example. Another attempt is identifying everything with an arbitrary GUID or UUID. Pixar moved away from that in their USD format for animation data.

[+] Guthur|3 years ago|reply
Is this not just context free vs connect sensitive?
[+] nivertech|3 years ago|reply
Isn't "No Ghosts" the same as "Make the implicit - explicit"?
[+] nemo1618|3 years ago|reply
tl;dr: A ghost is the opposite of a capability. Use capabilities.