There is no memory safety without thread safety

chadaustin|7 months ago

Every time this conversation comes up, I'm reminded of my team at Dropbox, where it was a rite of passage for new engineers to introduce a segfault in our Go server by not synchronizing writes to a data structure.

Swift has (had?) the same issue and I had to write a program to illustrate that Swift is (was?) perfectly happy to segfault under shared access to data structures.

Go has never been memory-safe (in the Rust and Java sense) and it's wild to me that it got branded as such.

tptacek|7 months ago

Right, the issue here is that the "Rust and Java sense" of memory safety is not the actual meaning of the term. People talk as if "memory safety" was a PLT axiom. It's not; it's a software security term of art.

This is just two groups of people talking past each other.

It's not as if Go programmers are unaware of the distinction you're talking about. It's literally the premise of the language; it's the basis for "share by communicating, don't communicate by sharing". Obviously, that didn't work out, and modern Go does a lot of sharing and needs a lot of synchronization. But: everybody understands that.

junebash|7 months ago

Swift is in the process of fixing this, but it’s a slow and painful transition; there’s an awful lot of unsafe code in the wild that wasn’t unsafe until recently.

potato-peeler|7 months ago

I am curious. Generally basic structures like map are not thread safe and care has to be taken while modifying it. This is pretty well documented in go spec. In your case in dropbox, what was essentially going on?

CJefferson|7 months ago

Before Rust, I'd reached the personal conclusion that large-scale thread-safe software was almost impossible -- certainly it required the highest levels of software engineering. Multi-process code was a much more reasonable option for mere mortals.

Rust on the other hand solves that. There is code you can't write easily in Rust, but just yesterday I took a rust iteration, changed 'iter()' to 'par_iter()', and given it compiled I had high confidence it was going to work (which it did).

rowanG077|7 months ago

I'm pretty surprised by some other comments in this thread saying this is a rare occurrence in go. In my experience it's not rare at all.

Thaxll|7 months ago

I have a hard time believing that it's common to create SEGFAULT in Go, I worked with the language for a very long time and don't remember a single time where I've seen that. ( and i've seen many data race )

Not synchronizing writes on most data structure does not create a SEGFAULT, you have to be in a very specific condition to create one, those conditions are extremely rares and un-usual ( from the programmer perspective).

In OP blog to triggers one he's doing one of those condition in an infinite loop.

https://research.swtch.com/gorace

commandersaki|7 months ago

To put things in perspective, I posit to you, how many memory unsafe things can you do in Go that isn’t a variant of the same thing?

Or put another way what is the likelihood that a go program is memory unsafe?

tapirl|7 months ago

Listens your team had not sufficient review capacity at that time.

unknown|7 months ago

[deleted]

pjmlp|7 months ago

It is kind of wild that for a 21st century programming language, the amount of stuff in Go that should have been but never was, but hey Docker and Kubernetes.

nosefrog|7 months ago

Hi Chad!

gok|7 months ago

Java is not memory-safe in the Rust sense.

adamwk|7 months ago

Crashing on shared access is the safe thing to do

Mawr|7 months ago

Safety isn't binary, so your comment makes no sense.

shadowgovt|7 months ago

Mostly because it was a remarkable improvement over what came before (and what came before was hilariously fragile).

jchw|7 months ago

This comes up now and again, somewhat akin to the Rust soundness hole issue. To be fair, it is a legitimate issue, and you could definitely cause it by accident, which is more than I can say about the Rust soundness hole(s?), which as far as I know are basically incomprehensible and about as likely to come across naturally as guessing someone's private key.

That said in many years of using Go in production I don't think I've ever come across a situation where the exact requirements to cause this bug have occurred.

Uber has talked a lot about bugs in Go code. This article is useful to understand some of the practical problems facing Go developers actually wind up being, particularly the table at the bottom summarizing how common each issue is.

https://www.uber.com/en-US/blog/data-race-patterns-in-go/

They don't have a specific category that would cover this issue, because most of the time concurrent map or slice accesses are on the same slice and this needs you to exhibit a torn read.

So why doesn't it come up more in practice? I dunno. Honestly beats me. I guess people are paranoid enough to avoid this particular pitfall most of the time, kind of like the Technology Connections theory on Americans and extension cords/powerstrips[1]. Re-assigning variables that are known to be used concurrently is obvious enough to be a problem and the language has atomics, channels, mutex locks so I think most people just don't wind up doing that in a concurrent context (or at least certainly not on purpose.) The race detector will definitely find it.

For some performance hit, though, the torn reads problem could just be fixed. I think they should probably do it, but I'm not losing sweat over all of the Go code in production. It hasn't really been a big issue.

[1]: https://www.youtube.com/watch?v=K_q-xnYRugQ

bombela|7 months ago

It took months to finally solve a data race in Go. No race detector would see anything. Nobody understood what was happening.

It ultimately resulted in a loop counter overflowing, which recomputed the same thing a billion of time (but always the same!). So the visible effect was a request would randomly take 3 min instead of 100ms.

I ended up using perf in production, which indirectly lead me to understand the data race.

I was called in to help the team because of my experience debugging the weirdest things as a platform dev.

Because of this I was exposed to so many races in Go, from my biased point of view, I want Rust everywhere instead.

But I guess I am putting myself out of a job? ;)

ameliaquining|7 months ago

I think it's also worth noting that Rust's maintainers acknowledge its various soundness holes as bugs that need to be fixed. It's just that some of them, like https://github.com/rust-lang/rust/issues/25860 (which I assume you're referring to), need major refactors of certain parts of the compiler in order to fix, so it's taking a while.

ralfj|7 months ago

Yeah, I can totally believe that this is not a big issue in practice.

But I think terms like "memory safety" should have a reasonably strict meaning, and languages that go the extra mile of actually preventing memory corruption even in concurrent programs (which is basically everything typically considered "memory safe" except Go) should not be put into the same bucket as languages that decide not to go through this hassle.

qcnguy|7 months ago

What do Uber mean in that article when they say that Go programs "expose 8x more concurrency compared to Java microservices"? They're using the word concurrency as if it were a countable noun.

sethammons|7 months ago

That Uber article is fantastic. I believe Go fixed the first example recently.

We had a rule at my last gig: avoid anonymous functions and always recover from them.

chc4|7 months ago

This is one of the things that I'm also looking on at Zig like a slow moving car crash about: they claim they are memory safe (or at least "good enough" memory safe if you use the safe optimization level, which is it's own discussion), but they don't have the equivalent to Rust's Send/Sync types. It just so happens that in practice no one was writing enough concurrent Zig code to get bitten by it a lot, I guess...except that now they're working on bringing back first-class async support to the language, which will run futures on other threads and presumably a lot of feet are going to be fired at once that lands.

ameliaquining|7 months ago

IIUC even single-threaded Zig programs built with ReleaseSafe are not guaranteed to be free of memory corruption vulnerabilities; for example, dereferencing a pointer to a local variable that's no longer alive is undefined behavior in all optimization modes.

cibyr|7 months ago

Zig's claims of memory safety are a bad joke. Sure, it's easier to avoid memory safety bugs in Zig than it is in C, but that's also true of C++ (which nobody claims is a memory safe language).

tptacek|7 months ago

This is a canard.

What's happening here, as happens so often in other situations, is that a term of art was created to describe something complicated; in this case, "memory safety", to describe the property of programming languages that don't admit to memory corruption vulnerabilities, such as stack and heap overflows, use-after-frees, and type confusions. Later, people uninvolved with the popularization of the term took the term and tried to define it from first principles, arriving at a place different than the term of art. We saw the same thing happen with "zero trust networking".

The fact is that Go doesn't admit memory corruption vulnerabilities, and the way you know that is the fact that there are practically zero exploits for memory corruption vulnerabilities targeting pure Go programs, despite the popularity of the language.

Another way to reach the same conclusion is to note that this post's argument proves far too much; by the definition used by this author, most other higher-level languages (the author exempts Java, but really only Java) also fail to be memory safe.

Is Rust "safer" in some senses than Go? Almost certainly. Pure functional languages are safer still. "Safety" as a general concept in programming languages is a spectrum. But "memory safety" isn't; it's a threshold test. If you want to claim that a language is memory-unsafe, POC || GTFO.

kllrnohj|7 months ago

> in this case, "memory safety", to describe the property of programming languages that don't admit to memory corruption vulnerabilities, such as [..] type confusions

> The fact is that Go doesn't admit memory corruption vulnerabilities

Except it does. This is exactly the example in the article. Type confusion causes it to treat an integer as a pointer & deference it. This then trivially can result in memory corruption depending on the value of the integer. In the example the value "42" is used so that it crashes with a nice segfault thanks to lower-page guarding, but that's just for ease of demonstration. There's nothing magical about the choice of 42 - it could just as easily have been any number in the valid address space.

Sharlin|7 months ago

> to describe the property of programming languages that don't admit to memory corruption vulnerabilities, such as stack and heap overflows, use-after-frees, and type confusions.

And data races allow all of that. There cannot be memory-safe languages supporting multi-threading that admit data races that lead to UB. If Go does admit data races it is not memory-safe. If a program can end up in a state that the language specification does not recognize (such as termination by SIGSEGV), it’s not memory safe. This is the only reasonable definition of memory safety.

jstarks|7 months ago

> If you want to claim that a language is memory-unsafe, POC || GTFO.

There's a POC right in the post, demonstrating type confusion due to a torn read of a fat pointer. I think it could have just as easily been an out-of-bounds write via a torn read of a slice. I don't see how you can seriously call this memory safe, even by a conservative definition.

Did you mean POC against a real program? Is that your bar?

ralfj|7 months ago

> Another way to reach the same conclusion is to note that this post's argument proves far too much; by the definition used by this author, most other higher-level languages (the author exempts Java, but really only Java) also fail to be memory safe.

This is wrong.

I explicitly exempt Java, OCaml, C#, JavaScript, and WebAssembly. And I implicitly exempt everyone else when I say that Go is the only language I know of that has this problem.

(I won't reply to the rest since we're already discussing that at https://news.ycombinator.com/item?id=44678566 )

weinzierl|7 months ago

"What's happening here, as happens so often in other situations, is that a term of art was created to describe something complicated; [..] Later, people uninvolved with the popularization of the term took the term and tried to define it from first principles, arriving at a place different than the term of art."

Happens all the time in math and physics but having centuries of experience with this issue we usually just slap the name of a person on the name of the concept. That is why we have Gaussian Curvature and Riemann Integrals. Maybe we should speak of Jung Memory Safety too.

Thinking about it, the opposite also happens. In the early 19th century "group" had a specific meaning, today it has a much broader meaning with the original meaning preserved under the term "Galois Group".

Or even simpler: For the longest time seconds were defined as fraction of a day and varied in length. Now we have a precise and constant definition and still call them seconds and not ISO seconds.

lenkite|7 months ago

How does Java "fail" to be memory safe by the definition used by the author ? Please give an example.

empath75|7 months ago

> Another way to reach the same conclusion is to note that this post's argument proves far too much; by the definition used by this author, most other higher-level languages (the author exempts Java, but really only Java) also fail to be memory safe.

Yes I mean that was the whole reason they invented rust. If there were a bunch of performant memory safe languages already they wouldn't have needed to.

johnnyjeans|7 months ago

This is a good post and I agree with it in full, but I just wanted to point out that (safe) Rust is safer from data races than, say, Haskell due to the properties of an affine type system.

Haskell in general is a much safer than Rust thanks to its more robust type system (which also forms the basis of its metaprogramming facilities), monads being much louder than unsafe blocks, etc. But data races and deadlocks are one of the few things Rust has over it. There are some pure functional languages that are dependently typed like Idris, and thus far safer than Rust, but they're in the minority and I've yet to find anybody using them industrially. Also Fortnite's Verse thing? I don't know how pure that language is though.

Mawr|7 months ago

> The fact is that Go doesn't admit memory corruption vulnerabilities, and the way you know that is the fact that there are practically zero exploits for memory corruption vulnerabilities targeting pure Go programs, despite the popularity of the language.

Another way to word it: If "Go is memory unsafe" is such a revelation after its been around for 13 years, it's more likely that such a statement is somehow wrong than that nobody's picked up on such a supposedly impactful safety issue in all this time.

As such, the burden of proof that addresses why nobody's ran into any serious safety issues in the last 13 years is on the OP. It's not enough to show some theoretical program that exhibits the issue, clearly that is not enough to cause real problems.

elktown|7 months ago

The older I get the more I just see these kinds of threads like I see politics: Exaggerate your "opponents" weaknesses, underplay/ignore its strengths and so on. So if something no matter how disproportionate can be construed to be, or be associate with, a current zeitgeist with a negative sentiment, it's an opportunity to gain ground.

I really don't understand why people get so obsessed with their tools that it turns into a political battleground. It's a means to an end. Not the end itself.

FiloSottile|7 months ago

I have never seen real Go code (i.e. not code written purposefully to be exploitable) that was exploitable due to a data race.

This doesn’t prove a negative, but is probably a good hint that this risk is not something worth prioritizing for Go applications from a security point of view.

Compare this with C/C++ where 60-75% of real world vulnerabilities are memory safety vulnerabilities. Memory safety is definitely a spectrum, and I’d argue there are diminishing returns.

stouset|7 months ago

Maintenance in general is a burden much greater than CVEs. Exploits are bad, certainly, but a bug not being exploitable is still a bug that needs to be fixed.

With maintenance being a "large" integer multiple of initial development, anything that brings that factor down is probably worth it, even if it comes at an incremental cost in getting your thing out the door.

LtWorf|7 months ago

I have! What do i win?

crawshaw|7 months ago

Memory safety is a big deal because many of the CVEs against C programs are memory safety bugs. Thread safety is not a major source of CVEs against Go programs.

It’s a nice theoretical argument but doesn’t hold up in practice.

nine_k|7 months ago

A typical memory safety issue in a C program is likely to generate an RCE. A thread-safety issue that leads to a segfault can likely only lead to a DoS attack, unpleasant but much less dangerous. A race condition can theoretically lead to more powerful attacks, but triggering it should be much harder.

okanat|7 months ago

It depends on what threads can do. Threads share memory with other threads and you can corrupt the data structure to force the other thread to do an unsafe / invalid operation.

It can be as simple as changing the size of a vector from one thread while the other one accesses it. When executed sequentiality, the operations are safe. With concurrency all bets are off. Even with Go. Hence the argument in TFA.

stouset|7 months ago

A CVE is worse, but a threading bug resulting in corrupted data or a crash is still a bug that needs someone to triage, understand, and fix.

kllrnohj|7 months ago

This isn't arguing about exploit risks of the language but simply whether or not it meets the definition of memory safe. Go doesn't satisfy the definition, so it's not memory safe. It's quite black & white here.

Nice strawman though

qcnguy|7 months ago

The point being made is sound, but I can never escape the feeling that most concurrency discussion in programming language theory is ignoring the elephant in the room. The concurrency bugs that matter in most apps are all happening inside the database due to lack of proper locking, transactions or transactional isolation. PL theory ignores this and so things like Rust's approach to race freedom ends up not mattering much outside of places like kernels. A Rust app can avoid use of unsafe entirely and still be riddled with race conditions because all the data that matters is in an RDBMS and someone forgot a FOR UPDATE in their SELECT clause.

layer8|7 months ago

What’s worse, even if you use proper transactions for everything, it’s hard to reason about visibility and data races when performing SQL across tables, or multiple dependent SQL statements within a transaction.

norir|7 months ago

The sad thing is that most languages with threads have a default of global variables and unrestricted shared memory access. This is the source of the vast majority of data corruption and races. Processes are generally a better concurrency model than threads, but they are unfortunately too heavyweight for many use cases. If we defaulted to message passing all required data to each thread (either by always copying or tracking ownership to elide unnecessary copying), most of these kinds of problems would go away.

In the meantime, we thankfully have agency and are free to choose not to use global variables and shared memory even if the platform offers them to us.

kibwen|7 months ago

> The sad thing is that most languages with threads have a default of global variables and unrestricted shared memory access. This is the source of the vast majority of data corruption and races. Processes are generally a better concurrency model than threads

Modern languages have the option of representing thread-safety in the type system, e.g. what Rust does, where working with threads is a dream (especially when you get to use structured concurrency via thread::scope).

People tend to forget that Rust's original goal was not "let's make a memory-safe systems language", it was "let's make a thread-safe systems language", and memory safety just came along for the ride.

zozbot234|7 months ago

Message passing can easily lead to more logical errors (such as race conditions and/or deadlocks) than sharing memory directly with properly synchronized access. It's not a silver bullet.

camgunz|7 months ago

I feel like I'm defending Go constantly these days. I don't even like Go!

Go can already ensure "consistency of multi-word values": use whatever synchronization you want. If you don't, and you put a race into your code, weird shit will happen because torn reads/writes are fuckin weird. You might say "Go shouldn't let you do that", but I appreciate that Go lets me make the tradeoff myself, with a factoring of my choosing. You might not, and that's fine.

But like, this effort to blow data races up to the level of C/C++ memory safety issues (this is what is intended by invoking "memory safety") is polemic. They're nowhere near the same problem or danger level. You can't walk 5 feet through a C/C++ codebase w/o seeing a memory safety issue. There are... zero Go CVEs resulting from this? QED.

EDIT:

I knew I remembered this blog. Here's a thing I read that I thought was perfectly reasonable: https://www.ralfj.de/blog/2021/11/18/ub-good-idea.html. Quote:

"To sum up: most of the time, ensuring Well-Defined Behavior is the responsibility of the type system, but as language designers we should not rule out the idea of sharing that responsibility with the programmer."

dcsommer|7 months ago

Unsafety in a language is fine as long as it is clearly demarcated. The problem with Go's approach is there no clear demarcation of the unsafety, making reasoning about it much more difficult.

advisedwang|7 months ago

Wow that's a really big gotcha in go!

To be fair though, go has a big emphasis on using its communication primitives instead of directly sharing memory between goroutines [1].

[1] https://go.dev/blog/codelab-share

TheDong|7 months ago

Even if you use channels to send things between goroutines, go makes it very hard to do so safely because it doesn't have the idea of sendable types, ownership, read-only references, and so on.

For example, is the following program safe, or does it race?

    func processData(lines <-chan []byte) {
      for line := range lines {
        fmt.Printf("processing line: %v\n", line)
      }
    }

    func main() {
      lines := make(chan []byte)
      go processData(lines)

      var buf bytes.Buffer
      for range 3 {
        buf.WriteString("mock data, assume this got read into the buffer from a file or something")
        lines <- buf.Bytes()
        buf.Reset()
      }
    }

The answer is of course that it's a data race. Why?

Because `buf.Bytes()` returns the underlying memory, and then `Reset` lets you re-use the same backing memory, and so "processData" and "main" are both writing to the same data at the same time.

In rust, this would not compile because it is two mutable references to the same data, you'd either have to send ownership across the channel, or send a copy.

In go, it's confusing. If you use `bytes.Buffer.ReadBytes("\n")` you get a copy back, so you can send it. Same for `bytes.Buffer.String()`.

But if you use `bytes.Buffer.Bytes()` you get something you can't pass across a channel safely, unless you also never use that bytes.Buffer again.

Channels in rust solve this problem because rust understands "sending" and ownership. Go does not have those things, and so they just give you a new tool to shoot yourself in the foot that is slower than mutexes, and based on my experience with new gophers, also more difficult to use correctly.

zozbot234|7 months ago

Real-world golang programs share memory all the time, because the "share by communicating" pattern leads to pervasive logical problems, i.e. "safe" race conditions and "safe" deadlocks.

Mawr|7 months ago

Wow who knew concurrency is hard!

This isn't anything special, if you want to start dealing with concurrency you're going to have to know about race conditions and such. There is no language that can ever address that because your program will always be interacting with the outside world.

codys|7 months ago

Curiously, Go itself is unclear about its memory safety on go.dev. It has a few references to memory safety in the FAQ (https://go.dev/doc/faq#Do_Go_programs_link_with_Cpp_programs, https://go.dev/doc/faq#unions) implying that Go is memory safe, but never defines what those FAQ questions mean with their statements about "memory safety". There is a 2012 presentation by Rob Pike (https://go.dev/talks/2012/splash.slide#49) where it is stated that go is "Not purely memory safe", seeming to disagree with the more recent FAQ. What is meant by "purely memory safe" is also not defined. The Go documentation for the race detector talks about whether operations are "safe" when mutexes aren't added, but doesn't clarify what "safe" actually means (https://go.dev/doc/articles/race_detector#Unprotected_global...). The git record is similarly unclear.

In contrast to the go project itself, external users of Go frequently make strong claims about Go's memory safety. fly.io calls Go a "memory-safe programming language" in their security documentation (https://fly.io/docs/security/security-at-fly-io/#application...). They don't indicate what a "memory-safe programming language" is. The owners of "memorysafety.org" also list Go as a memory safe language (https://www.memorysafety.org/docs/memory-safety/). This later link doesn't have a concrete definition of the meaning of memory safety, but is kind enough to provide a non-exaustive list of example issues one of which ("Out of Bounds Reads and Writes") is shown by the article from this post to be something not given to us by Go, indicating memorysafety.org may wish to update their list.

It seems like at the very least Go and others could make it more clear what they mean by memory safety, and the existence of this kind of error in Go indicates that they likely should avoid calling Go memory safe without qualification.

ralfj|7 months ago

> Curiously, Go itself is unclear about its memory safety on go.dev.

Yeah... I was actually surprised by that when I did the research for the article. I had to go to Wikipedia to find a reference for "Go is considered memory-safe".

Maybe they didn't think much about it, or maybe they enjoy the ambiguity. IMO it'd be more honest to just clearly state this. I don't mind Go making different trade-offs than my favorite language, but I do mind them not being upfront about the consequences of their choices.

phire|7 months ago

The definition kind of changed.

At the time Go was created, it met one common definition of "memory safety", which was essentially "have a garbage collector". And compared to c/c++, it is much safer.

Thaxll|7 months ago

Go is memory safe by the most common definition, does not matter if you have segfault in some scenario.

How many exploits or security issues have there been related to data race on dual word values? I work with Go for the last 10 years and I never heard of such issues. Not a single time.

zozbot234|7 months ago

The most common definition of memory safe is literally "cannot segfault" (unless invoking some explicitly unsafe operation - which is not the case here unless you think the "go" keyword should be unsafe).

Yoric|7 months ago

Segfaults are just the simplest way of exposing a memory issue. It's quite easy to use a race condition to reproduce a state that isn't supposed to be reachable, and that's much worse than a segfault, because it means memory corruption.

Now the big question, as you mention, is "can it be exploited?" My assumption is that it can, but that there are much lower-hanging fruits. But it's just an assumption, and I don't even know how to check it.

corysama|7 months ago

This is why I’m excited about https://www.hylo-lang.org/ as a new, statically-compiled language with all the safeties!

dataflow|7 months ago

Am I missing something or is that bold claim obviously wrong on its face? This seems like a Go deficiency (lack of atomicity for it pointers), not some sort of law about programming languages.

Can you violate memory safety in C# without unsafe{} blocks (or GCHandle/Marshal/etc.)? (No.)

Can you write thread-unsafe code in C# without using unsafe{} blocks etc.? (Yes, just make your integers race.)

Doesn't that contradict the claim that you can't have memory safety without thread safety?

shadowgovt|7 months ago

This is, in my mind, the trickiest issue with Rust right now as a language project, to wit:

- The above is true

- If I'm writing something using a systems language, it's because I care about performance details that would include things like "I want to spawn and curate threads."

- Relative to the borrow-checker, the Rust thread lifecycle static typing is much more complicated. I think it is because it's reflecting some real complexity in the underlying problem domain, but the problem stands that the description of resource allocation across threads can get very hairy very fast.

pornel|7 months ago

I don't know what you're referring to. Rust's threads are OS threads. There's no magic runtime there.

The same memory corruption gotchas caused by threads exist, regardless of whether there is a borrow checker or not.

Rust makes it easier to work with non-trivial multi-threaded code thanks to giving robust guarantees at compile time, even across 3rd party dependencies, even if dynamic callbacks are used.

Appeasing the borrow checker is much easier than dealing with heisenbugs. Type system compile-time errors are a thing you can immediately see and fix before problems happen.

OTOH some racing use-after-free or memory corruption can be a massive pain to debug, especially when it may not be possible to produce in a debugger due to timing, or hard to catch when it happens when the corruption "only" mangles the data instead of crashing the program.

swiftcoder|7 months ago

I wish we had picked a better name than "thread safety". This is really more like "concurrency safety", since it applies even in the absence of hardware threads.

layer8|7 months ago

Threads aren’t hardware, they are OS. Multihreading != multiprocessing.

aatd86|7 months ago

Why does it segfault? Because you have not used a sufficiently clever value for the integer that wouldn't when used as an address?

Just wondering.

Realistically that would be quite rare since it is obvious that this is unprotected shared mutable access. But interesting that such a conversion without unsafe may happen. If it segfaults all the time though then we still have memory safety I guess.

The article is interesting but I wish it would try to provide ideas for solutions then.

alkonaut|7 months ago

And here I thought the type system and error handling were the two biggest Go warts. You’re now telling me their memory model is basically ”YOLO”?

munificent|7 months ago

I agree with the author's claim that you need thread safety for memory safety.

But I don't agree with:

> I will argue that this distinction isn’t all that useful, and that the actual property we want our programs to have is absence of Undefined Behavior.

There is plenty of undefined behavior that can't lead to violating memory safety. For example, in many languages, argument evaluation order is undefined. If you have some code like:

    foo(print(1), print(2));

In some languages, it's undefined as to whether "1" is printed before "2" or vice versa. But there's no way to violate memory safety with this.

I think the only term the author needs here is "memory safety", and they correctly observe that if the language has threading, then you need a memory model that ensures that threads can't break your memory safety.

Go lacks that. It seems to be a rare problem in practice, but if you want guarantees, Go doesn't give you them. In return, I guess it gives you slightly faster execution speed for writes that it allows to potentially be torn.

gliptic|7 months ago

The evaluation order is _unspecified_, not undefined behaviour.

ralfj|7 months ago

> There is plenty of undefined behavior that can't lead to violating memory safety. For example, in many languages, argument evaluation order is undefined. If you have some code like:

You are mixing up non-determinism and UB. Sadly that's a common misunderstanding.

See https://www.ralfj.de/blog/2021/11/18/ub-good-idea.html for an explanation of what UB is, though I don't go into the distinction to non-determinism there.

zozbot234|7 months ago

That's "unspecified" not "undefined". "Undefined behavior" literally means "anything goes", so any program that invokes it is broken by definition.

joaohaas|7 months ago

Your example does not classify as 'undefined behavior'. Something is 'undefined behavior' if it is specified in the language spec, and in such case yes, the language is capable of doing anything including violating memory safety.

nromiun|7 months ago

I bet not even 5% of all programs are multi-threaded, or even concurrent.

Memory safety is a much bigger problem.

goodpoint|7 months ago

> safety is not binary, it is a spectrum, and on that spectrum Go is much closer to a typical safe language than to C

That's a too low bar to clear to call it safe.

pizlonator|7 months ago

False.

Java got this right. Fil-C gets it right, too. So, there is memory safety without thread safety. And it’s really not that hard.

Memory safety is a separate property unless your language chooses to gate it on thread safety. Go (and some other languages) have such a gate. Not all memory safe languages have such a gate.

glowcoil|7 months ago

I would recommend reading beyond the title of a post before leaving replies like this, as your comment is thoroughly addressed in the text of the article:

> At this point you might be wondering, isn’t this a problem in many languages? Doesn’t Java also allow data races? And yes, Java does allow data races, but the Java developers spent a lot of effort to ensure that even programs with data races remain entirely well-defined. They even developed the first industrially deployed concurrency memory model for this purpose, many years before the C++11 memory model. The result of all of this work is that in a concurrent Java program, you might see unexpected outdated values for certain variables, such as a null pointer where you expected the reference to be properly initialized, but you will never be able to actually break the language and dereference an invalid dangling pointer and segfault at address 0x2a. In that sense, all Java programs are thread-safe.

And:

> Java programmers will sometimes use the terms “thread safe” and “memory safe” differently than C++ or Rust programmers would. From a Rust perspective, Java programs are memory- and thread-safe by construction. Java programmers take that so much for granted that they use the same term to refer to stronger properties, such as not having “unintended” data races or not having null pointer exceptions. However, such bugs cannot cause segfaults from invalid pointer uses, so these kinds of issues are qualitatively very different from the memory safety violation in my Go example. For the purpose of this blog post, I am using the low-level Rust and C++ meaning of these terms.

Java is in fact thread-safe in the sense of the term used in the article, unlike Go, so it is not a counterexample to the article's point at all.

jillesvangurp|7 months ago

It's not that black and white and the solution isn't necessarily pick language X and you'll be fine. It never is that simple.

Basically, functional languages make it easier to write code that is safe. But they aren't necessarily the fastest or the easiest to deal with. Erlang and related languages are a good example. And they are popular for good reasons.

Java got quite a few things right but it took a while for it to mature. Modern day Java is quite a different beast than the first versions of Java. The Thread class, API, and the language have quite a few things in there that aren't necessarily that great of an idea. E.g. the synchronized keyword might bite you if you are trying to use the new green threads implementation (you'll get some nice deadlocks if you block the one thread you have that does everything). The modern java.concurrent package is implemented mostly without it.

Of course people that know their history might remember that green threads are actually not that new. Java did not actually support real threads until v1.1. Version 1.0 only had green threads. Those went out of fashion for about two decades and then came back with recent versions. And now it does both. Which is dangerous if you are a bit fuzzy on the difference. It's like putting spoilers on your fiesta. Using green threads because they are "faster" is a good sign that you might need to educate yourself and shut up.

On the JVM, if you want to do concurrent and parallel stuff, Scala and Kotlin might be better options. All the right primitives are there in the JVM of course. And Java definitely gives you access to all it. But it also has three decades of API cruft and a conservative attitude about keeping backwards compatible with all of that. And not all of it was necessarily that all that great. I'm a big fan of Kotlin's co-routine support that is rooted in a lot of experience with that. But that's subjective of course. And Scala-ists will probably insist that Scala has even better things. And that's before we bring up things like Clojure.

Go provides a good balance between ease of use / simplicity and safety. But it has quite a few well documented blind spots as well. I'm not that big of a fan but I appreciate it for what it is. It's actually a nice choice for people that aren't well versed in this topic and it naturally nudges people in a direction where things probably will be fine. Rust is a lot less forgiving and using it will make you a great engineer because your code won't even compile until you properly get it and do it right. But it won't necessarily be easy (humbled by experience here).

With languages the popular "if you have a hammer everything looks like a nail" thing is very real. And stepping out of your comfort zone and realizing that other tools are available and might be better suited to what you are trying to do is a good skill to have.

IMHO python is actually undervalued. It was kind of shit at all of this for a long time. But they are making a lot of progress modernizing the language and platform and are addressing its traditional weaknesses. Better interpreting and jit performance, removing the GIL, async support that isn't half bad, etc. We might wake up one day and find it doing a lot of stuff that we'd traditionally use JVM/GO/Rust for a few years down the line. Acknowledging weaknesses and addressing those is what I'm calling out here as a very positive thing. Oddly, I think there are a lot of python people that are a bit conflicted about progress like this. I see the same with a lot of old school Java people. You get that with any language that survives that long.

Note how I did not mention C/C++ here so far. There's a lot of it out there. But if you care about safety, you should probably not go near it. I don't care how disciplined you are. Your C/C++ code has bugs. Any insistence that it doesn't just means you haven't found them yet. Possibly because you are being sloppy looking for them. Does it even have tests? There are whole classes of bugs that we can prevent with modern languages and practices. It's kind of negligent and irresponsible not to. There are attempts to make C++ better of course.

danbruc|7 months ago

Nope. You can have programs without undefined behavior and still not have thread safety. In .NET, for example, writes to variables that are wider then the machine width or not aligned properly, are not guaranteed to be atomic. So if you assign some value to an Int128 variable, it will not be updated atomically - how could it, that is just beyond the capabilities of the processor - and therefore a different thread can observe a state where only half of the variable has been updated. No undefined behavior here but also sharing this variable between threads is not thread safe. And having the language synchronize all such writes - just in case some other thread might want tot look at it - is a performance disaster. And disallowing anything that might be a potential thread safety issue will give you a pretty limited language.

tialaramex|7 months ago

> disallowing anything that might be a potential thread safety issue will give you a pretty limited language.

Safe Rust doesn't seem that limited to me.

I don't think any of the C# work I do wouldn't be possible in Rust, if we disregard the fact that the rest of the team don't know Rust.

Most of the programs you eliminate when you have these "onerous" requirements like memory safety are nonsense, they either sometimes didn't work or had weird bugs that would be difficult to understand and fix - sometimes they also had scary security implications like remote code execution. We're better off without them IMNSHO.

kllrnohj|7 months ago

Critically to the authors point that type of data race does not result in UB and does not break the language and thus does not create any memory safety issues. Ergo, it's a memory safe language.

Go (and previously Swift) fails at this. There data races can result in UB and thus break memory safety

ameliaquining|7 months ago

See the article's comments on Java, which is "thread safe" in the sense of preventing undefined behavior but not in the sense of preventing data-race-related logic bugs. .NET is precisely analogous in this respect.

kibwen|7 months ago

The statement "there is no memory safety without thread safety" does not suggest that memory safety is sufficient to provide thread safety. Instead, it's just saying that if you want thread safety, then memory safety is a requirement.

loeg|7 months ago

Are we still have semantic fights about what exactly memory safety means? Why?

sapiogram|7 months ago

Because people think Golang is immune to bugs that it's not immune from.

kazinator|7 months ago

This is false as a generality.

A memory safe, managed language doesn't become unsafe just because you have a race condition in a program.

Like, say, reading and writing several related shared variables without a mutex.

Say that the language ensures that the reads and writes themselves of these word-sized variables are safe without any lock, and that memory operations and reclamation of memory are thread safe: there are no low-level pointers (or else only as an escape hatch that the program isn't using).

The rest is your bug; the variable values coming out of sync with each other, not maintaining the invariant among their values.

It could be the case that a thread-unsafe program breaks a managed run-time, but not an unvarnished truth.

A managed run-time could be built on the assumption that the program will not create two or more threads such that those threads will invoke concurrent operations on the same objects. E.g. a managed run time that needs a global interpreter lock, but which is missing.

munificent|7 months ago

> A memory safe, managed language doesn't become unsafe just because you have a race condition in a program.

The author's point is that Go is not a memory safe language according to that distinction.

There are values that are a single "atomic" write in the language semantics (interface references, slices) that are implemented with multiple non-atomic writes in the compiler/runtime. The result is that you can observe a torn write and break the language's semantics.

ralfj|7 months ago

> The rest is your bug; the variable values coming out of sync with each other, not maintaining the invariant among their values.

If the language and its runtime let me break their invariant, then that's their bug, not mine. This is the fundamental promise of type-safe languages: you can't accidentally break the language abstraction.

> It could be the case that a thread-unsafe program breaks a managed run-time, but not an unvarnished truth.

I demonstrated that the Go runtime is such a case, and I think that should be considered a memory safety violation. Not sure which part of that you disagree with...

gpderetta|7 months ago

race condition != data race. Specifically, in go, a race condition can cause application level bugs but won't affect, directly, the runtime consistency; on the other hand a data race on a slice can cause torn writes and segfaults in the best case, and fandango on core in the worst case.

dodobirdlord|7 months ago

If the variables are word-sized, sure. But what if they are larger? Now a race condition between one thread writing and another thread reading or writing a variable is a memory safety issue.

qcnguy|7 months ago

The author knows that. His point is that Go doesn't work that way because it uses greater-than-word-sized values that can suffer torn writes leading to segfaults in some cases.

pharrington|7 months ago

Your fantasy language doesn't have a race condition.

singpolyma3|7 months ago

Honestly what I mostly want is to not have memory leaks. Which somehow stopped being a focus at some point

tialaramex|7 months ago

The "good" news is that Bjarne Stroustrup is right there with you, Bjarne sees eliminating all memory leaks as a high priority for C++ and one of his main goals.

The bad news ought to be obvious, this "goal" is not achievable, it's a fantasy that somehow we should be able to see the future, divine that some value stored won't be needed in the future and thus we don't need to store it. Goals like "We shouldn't store things we can't even refer to" are already solved in languages used today, so a goal to "not have memory leaks" refers only to that unachievable fantasy.

Wowfunhappy|7 months ago

Because we have so much memory no one cares if it leaks. <_<

astrange|7 months ago

This is harder than it looks as soon as you start counting abandoned memory (stuff that's still referenced but not actually used.)

PatriciaKim|7 months ago

[deleted]

unit149|7 months ago

[deleted]

20k|7 months ago

[deleted]

philosophty|7 months ago

Isn't it funny how anything you don't understand very well can seem weird?

Mawr|7 months ago

There is no house safety without nuclear warhead detonation safety.

There is no pedestrian safety without mandatory helmet laws.

There is no car safety without driving a tank.

kiitos|7 months ago

> To see what I mean by this, consider this program written in Go, which according to Wikipedia is memory-safe:

The Wikipedia definition of memory safety is not the Go definition of memory safety, and in Go programs it is the Go definition of memory safety that matters.

The program in the article is obviously racy according to the Go language spec and memory model. So this is all very much tilting at windmills.

ralfj|7 months ago

Can you point me to the Go definition of memory safety? I searched all over their website, and couldn't find any.

(But also, it'd be kind of silly for every language to make up their own definition of memory safety. Then even C is memory safe, they just have to define it the right way. ;)

503 comments