top | item 44661193

(no title)

lknuth | 7 months ago

It is alleviated quite a bit bz its pattern matching capabilities combined with the "let it crash" ethos.

They have a success typing system (which isn't very good) and are working on a fuller system (which isn't very mature).

If typing is the only thing keeping you out, have a look at Gleam.

Having worked with Elixir professionally for the last six years now, it is a very mature platform, very performant and offers many things that are hard in other languages right out of the box.

discuss

crabmusket|7 months ago

> combined with the "let it crash" ethos

I see this phrase around a lot and I wish I could understand it better, having not worked with Erlang and only a teeny tiny bit with Elixir.

If I ship a feature that has a type error on some code path and it errors in production, I've now shipped a bug to my customer who was relying on that code path.

How is "let it crash" helpful to my customer who now needs to wait for the issue to be noticed, resolved, a fix deployed, etc.?

Muromec|7 months ago

>How is "let it crash" helpful to my customer who now needs to wait for the issue to be noticed, resolved, a fix deployed, etc.?

Let it crash is more about autorestarting and less about type bugs. If you have a predictable bug in your codepath that always breaks something, it just means you never tested it and restarting will not fix it. But this kind of straightforward easy to reproduce bugs are also easy to test the hell out of.

But if you have a weird bug in a finite state machine that gets itself into a corner, but can be restarted -- "let it crash" helps you out.

Consider hot reload -- a field exists in a new version of a record, but doesn't exist in a old one. You can write a migration in gen server to take care of it, but if you didn't and it errored out, it's not the end of the world, it will restart and the problem will go away.

jlouis|7 months ago

The core of a properly built, resilient/robust system is that you have compartmentalized code into different small erlang processes. They work together to solve a problem. A bug in one is isolated to that particular process and can't take the whole system down. Rather, the rest of the system detects the problem, then restarts the faulty process.

The reason this is a sound strategy is that in larger systems, there will be bugs. And some of those bugs will have to do with concurrency. This means a retry is very likely to solve the bug if it only occurs relatively rarely. In a sense, it's the observation that it is easier to detect a concurrency bug than it is to fix it. Any larger system is safe because there's this onion-layered protection approach in place so a single error won't always become fatal to your system.

It's not really about types. It's about concurrency and also distribution. Type systems help eradicate bugs, but it's a different class of bugs those systems tend to be great at mitigating.

However, if you do ship a bug to a customer, it's often the case you don't have to fix said bug right away, because it doesn't let the rest of the application crash, so no other customer is affected by this. And you can wait until the weekend is over in many cases. Then triage the worst bugs top-down when you have time to do so.

jerf|7 months ago

In addition to the other fine replies, it helps to remember that Erlang/BEAM, and by extension Elixir, comes from the telephony world. If something crashes in that world, you'd far rather terminate a phone call spuriously than bring the whole system down. (And in the 1990s, when the main alternative was C, and not just C as it may be today but 1990s C specifically, that was a reasonable concern.) Erlang is optimized for a world where that's a reasonable response to a failure. I've also used it in a context where a system makes a persistent connection up to a controller, and if either side crashes they automatically reconnect. "Let it crash" is a reasonable response to a lot of issues that can arise.

The farther you get from that being an issue, the less useful the "let it crash" philosophy becomes, e.g., if I hit "bold" in my word processor and it fails for some reason, "let it crash" is probably not going to be all that helpful overall.

I have seen systems that "should" have been failures in the field be held together by Erlang's restart methodology. We still had to fix the bugs, but it bought us time to do it and prevented the bad deployments from being immediate problems. But it doesn't apply to everything equally by any means.

jaccarmac|7 months ago

A discussion by Joe is helpful here; It contains some code: https://erlang.org/pipermail/erlang-questions/2003-March/007...

"Crashing is loud" below is a phrase to combine with "remote error recovery" from the link above. Erlang/OTP wants application structure that is peculiar to it, and makes that structure feel ergonomic.

> If I ship a feature that has a type error on some code path ... How is "let it crash" helpful to my customer?

The crash can be confined to the feature instead of taking down the entire app or corrupting its state. With a well-designed supervision structure, you can also provide feedback that makes the error report easier to solve.

However, while a type error in some feature path is a place that makes type annotations make sense, type annotations can only capture a limited set of invariants. Stronger type systems encode more complex invariants, but have a cost. "Let it crash" means bringing a supervisor with simple behavior (usually restart) or human into the loop when you leave the happy path.

zdragnar|7 months ago

Let it crash definitely wasn't meant to be a cover for poorly typed code.

It's much more suitable as a replacement for adding try / catch everywhere and having to manually bubble exceptions.

unknown|7 months ago

[deleted]

aeturnum|7 months ago

Crashing is loud. You will get crashes in your logs. And you can let it happen because those crashes won't disrupt anything else - that's what the message passing gets you.

So sure, the code with the error won't work (it wouldn't work in any language - you can make an error in all of them), but you will get a nice, full stack trace and the other processes in your VM won't be impacted at all. You won't bring down the service with a crash. Sometimes this is undesirable - you could deploy a service where the only endpoint that functions is the health check - but generally people don't do that.

crdrost|7 months ago

I feel like the other comments I see here, are not expressing the deeper point, and you're asking to really understand something, so that's what you care about? So I'm sorry to pile on when you've got like a dozen good answers, but.

Let It Crash refers to a sort of middle ground between returning an error code and throwing an exception. It does not directly address your customer's need, and you are right that they are facing a bug.

So if you were to use Golang with Let It Crash ethos, say, you would write a lot of functions with the same template: they take an ID and a channel, they defer a call to recover from panics, and on panic or success they send a {pid int, success bool, error interface {}} to the channel -- and these are always ever run as goroutines.

Because this is how you write everything, you have some goroutines that supervise other goroutines. For example, auto-restart this other goroutine, with exponential backoff. But also the default is to panic every error rather than endless "if err != nil return nil, err" statements. You trust that you are always in the middle of such a supervisor tree and someone has already thought about what to do to handle uncaught errors. Because supervision trees is just the style of program that you write. Say you lose your connection to the database, it goes down for maintenance or something. Well the connection pool for the database was a separate go routine thread in your application, that thread is now in CrashLoopBackoff. But your application doesn't crash. Say it powers an HTTP server, while the database is down, it responds to any requests that do not use the database just fine, and returns HTTP 500 on all the requests that do use the database. Why? Because your HTTP library, allocates a new goroutine for every request it handles, and when those panic it by default doesn't retry and closes the connection with HTTP 500. Similarly for your broken codepath, it 500s the particular requests that x.(X) something that can't be asserted as an X, we log the error, but all other requests are peachy keen, we didn't panic the whole server.

Now that is different from the first thing that your parent commenter said to you, which is that the default idiom is to do something like this:

    type Message {
        MessageType string
        Args interface{}
        Caller chan<- Message
    }
    // ...
    msg := <-myMailbox
    switclMessageType {
    case "allocate":
      toAllocate := args.(int)
      if allocated[toAllocate 
        msg.Caller <- Message{"fail", fmt.Errorf(...), my mailbox}
      } else {
        // Save this somewhere, then
        msg.Caller <- Message{"ok", , my mailbox}
      }
    }

With a bit of discipline, this emulates Haskell algebraic data types which can give you a sort of runtime guarantee that bad code looks bad (imagine switching on an enum `case TypeFoo`: foo := arg.(Foo)`, if you put something wrong in there it is very easy to spot during code review because it's a very formulaic format)

So the idea is that your type assertions don't crash the program, and they are usually correct because you send everything like a sum type.

Einenlum|7 months ago

Thanks for your answer! I already checked Gleam several times and it looks amazing. The ecosystem just doesn't feel mature enough for me yet. But I can't wait for it to grow.

lknuth|7 months ago

True. There is inter-op with both Elixir and Erlang, but thsts like early TypeScript.

If you're at all interested, I'd suggest doing the basic and OTP tutorials on the Elixir Website. Takes about two hours. Seeing what's included and how it works is probably the strongest sails pitch.