top | item 12042198

Go’s march to low-latency GC

400 points| etrevino | 9 years ago |blog.twitch.tv | reply

289 comments

order
[+] _ph_|9 years ago|reply
Another very nice feature of Go is, that since 1.5, the whole runtime, including the GC is written in Go itself. So every Go developer can look at the implementation. The GC itself is surprisingly small amount of code.
[+] chrisseaton|9 years ago|reply
I never understand this argument. Are there people who know enough about GC to understand potential issues and have an informed opinion on how to improve it, but can't read C?
[+] colordrops|9 years ago|reply
Excuse my obvious ignorance, but how is the GC written in Go? The GC in Go is not optional, right? Does the GC use GC? Turtles all the way down?
[+] knorker|9 years ago|reply
It's not yet idiomatic or good Go code now, though, is it?

IIRC the initial state (1.5) was mostly machine translated code from C to Go.

[+] vorg|9 years ago|reply
> since 1.5, the whole runtime is written in Go itself

The parser was written in Yacc (with C code generated) until version 1.6. I'm wondering if there's any other parts of Go yet to be converted to Go.

[+] r1ch|9 years ago|reply
I have to wonder - when you're digging down into this level of complexity in order to discover issues with the language you're using, wouldn't something like C be better? IRC isn't a very hard protocol and you know the language won't be getting in your way if you're using C.
[+] zamalek|9 years ago|reply
> wouldn't something like C be better?

Jim is an intermediate-level coder, by and far the average guy that you are going to get. He writes his IRC server in C. It performs acceptably and can be scaled horizontally. There are a few threading bugs and exploits (buffer overflows etc.).

Sally is an advanced coder, it took a year of recruitment to find her. She also writes her server in C. It's blazingly fast. Virtually nobody else understands how it works. She's a human, so it's still littered with the same types of bugs that Jim's server has.

Jack is at the same level as Jim. He starts off in Go 1.4. While his server is nowhere as fast as Sally's, its much faster than Jim's. Race conditions and exploits tend toward zero. Everyone on the team can approach the code and maintain it.

Go 1.6 is installed on prod and suddenly Jack's server is now negligibly slower than Sally's. Jim notices this and has to spend a few weeks on his to catch it up. Sally is stuck debugging a race condition that occurs once a month. Jack is adding more emoticons, more features and decommissioning servers in the cluster.

Edit: IRC is a simple problem and that begets a simple solution. While C may be significantly simpler than C++, Go requires far less cognitive effort: it is actually simpler than C.

[+] topspin|9 years ago|reply
Reading this causes me to experience déjà vu; years and years of reading stories and watching presentations about someone struggling with GC in the JVM. It's happening all over again with Go. The same 'discoveries', the same trade-offs, the same discussions about hardware resources, the same 'concurrent mark and sweep', the same 'more to do' conclusions. You could replace every occurrence of 'Go' with 'Java' and it would probably go undiscovered.

Maybe it's all worth it and this is how developers are supposed to spend their time, but it's no longer interesting to me.

[+] karma_vaccum123|9 years ago|reply
Then we might be in some alternate reality talking about how Twitch could never deliver a viable service because the developers kept creating segfaults. C is the last language I would choose in a race to a viable service.
[+] jjnoakes|9 years ago|reply
At scale it may be more efficient to write the system in a higher level language (saving time) and then spending some time tuning only the slowest parts, instead of building everything from the start to be highly optimized, even the parts which may not need it (investing time where it may not produce results).

And the improvements to Go that they drove will help everyone.

I happen to prefer C, but I understand why they did it the way they did it.

[+] daenney|9 years ago|reply
Better in what way? Performance wise, perhaps but that's only one aspect of why someone might pick one language over another.

Even distilling better down to just the max throughput you can get for a solution in a language vs another is hard to do as a lot also depends on how the code ends up being written and how easy you want to be able to debug that solution. You can solve this stuff in C many ways with different performance characteristics.

[+] bkeroack|9 years ago|reply
In some ways it's accurate to think of Go as a more convenient version of C with modern facilities like automatic memory management, concurrency primitives and data structures (i.e. maps), with the minimal level of runtime scaffolding included to support them. Interop with C is very easy, and Go is miles away (stylistically) from some of the more esoteric and abstract languages that are used these days.
[+] weberc2|9 years ago|reply
C is not a very nice language for concurrent programming.
[+] anonymoushn|9 years ago|reply
If you want to hire 300 people to write reliable software in a language they don't know yet, Go is a good choice. You might also have like half a dozen people who are so deep into Go that they do the stuff in this post.
[+] ben_jones|9 years ago|reply
This may be anecdotal but Twitch is an example of a service that just bloody works. I've been a user for awhile and I've yet to notice any noticeable service disruptions or issues. They were also one of the largest early adopters of EmberJS, pretty sure it was well before the 1.0 release when many bugs were still being worked out and the API suffered frequent changes, so hats off to the engineering team for continued awesome work.
[+] r1ch|9 years ago|reply
Twitch has a fairly high number of outages, although not all affect video playback (eg API outages). Most recently the whole site was down for about an hour from EU due to a botched CDN setup. I have a status tracker that monitors from four locations, https://twitter.com/TwitchStatus
[+] srpablo|9 years ago|reply
I'll always be a little bitter about their VODpocalypse and retroactively muting streams with copyrighted music.

Software should serve people and they eradicated countless memories/achievements, eliminated a priceless historical record.

I don't mean to diminish how untenable the previous situation was, and I'm sure I'm underestimating the difficulty/cost of what they ended up doing. I appreciate their work, engineering, and use the service regularly. But it's an "Our Incredible Journey" part of their story and I don't want to let them off the narrative hook for it. They made ~$1b on this content, after all.

[+] hdra|9 years ago|reply
Maybe if you have fast internet connection. I'm on a HSDPA+ connection and Twitch is unusable, not even the VODs. Then again, Youtube and Vimeo is pretty much the only sites from which I can watch video streams smoothly.
[+] asdf1234|9 years ago|reply
The video service almost always works. The website has issues pretty regularly.
[+] anonymousDan|9 years ago|reply
So how does the GC performance of Go compare to something like Java/the Hotspot JVM?
[+] _ph_|9 years ago|reply
The approaches to GC are difficult to compare, and Java offers a selection of garbage collectors. Overall, the Java collectors are very sophisticated and tuned over years, so in principle are excellent. The downside is, that the Java language itself puts a lot of stress on the GC. The biggest problem is, that Java offers no "value" types beyond the builtin int, double,... So everything else has to be allocated as a separate object and pointed to via references. The GC then has to trace all these references, which takes time. While a collection of the youngest generation in Java is extremely fast, a global GC can take quite some time.

Go on the other side has structs as values, so the memory layout is much easier for the GC. Go always performs full GCs, but mostly running in parallel with the application, a GC cycle only requires a stop-the-world phase of a few milliseconds (for multi gigabyte heaps).

All these numbers of course depend a lot on what your application is doing, but overall Go seems to be doing very well with its newest iterations of the GC.

[+] nvarsj|9 years ago|reply
A significant drawback of the hotspot JVM is the amount of memory required for even simple apps. At least 64Mi for the most simple, and typically much higher. A typical web app with a 1000 request threads will use something like 1.5Gi of memory (512Mi heap, 1Gi for thread stacks, classes, etc).

Golang apps tend to happily run with less than 100Mi, so are well suited as daemon processes that don't get in the way.

However if you need to support a large amount of dynamic state (> 1Gi), the hotspot GC is very difficult to beat.

[+] geodel|9 years ago|reply
Technically hotspot GC might do more work in same amount of time but Go's GC makes some performance guarantees like <10ms STW phase which hotspot do not claim or offers for large heaps.
[+] fauigerzigerk|9 years ago|reply
That's interesting, but it would be even more interesting if the article contained some info about heap sizes, memory utilization and the number of CPU cores.
[+] pepesza|9 years ago|reply
I think that not using Erlang in this particular case was a mistake. Erlang is running some of the largest chats out there, including League of Legends and WhatsApp. They would have avoided all the hassle of GC pauses, since Erlang has per-process GC collection. And scaling single OS process to their number of connections was done for Erlang machines years ago.
[+] dayjah|9 years ago|reply
Hi there, I'm one of the original engineers who worked on our re-implementation of chat which ended up in Go.

We've a culture of being willing to try new things at Twitch. When our twisted-python chat system no longer met our needs of being easy to iterate on we decided to rebuild it; it was a monolith and we decided to chunk it up to reflect needs of our users and the pace at which we could develop new features. Notably we wanted to no recycle TCP connections whenever a new feature was added (which was a short coming of the twisted-python solution - along with a bunch of global state that was becoming hard to reason about). As part of this re-work we had a pub-sub portion which was super simple and we decided to try this new exciting language with a lot of promise out on it - it worked amazingly well. Over the course of another year or so we ended up rebuilding all of the components in Go.

When we first evaluated rebuilding chat we assessed a few options:

- python

- nodejs (we started with this, but random crashes and poor tooling at the time didn't work for us)

- erlang (notably could we use ejabberd as the hub of the system)

Ultimately we chose python because we knew python and we needed this to work right now. The move to go happened incrementally thereafter and was driven by:

- increase in trust

- great tooling

None of this can be pitched as "Go vs X", it is purely a tools and expediency orientated set of decisions.

[+] woodcut|9 years ago|reply
Finding an erlang programmer available on site within 1-2 months is the hardest part of deciding to go with erlang. With go you can take a C++/python programmer and have them writing production code pretty soon, i think this is what inhibits functional programming in general, the learning curve bundled with the amount of work around prevents people jumping onboard also willingness of some employers to hire someone without a ton of exp. with erlang makes it difficult for a senior programmer to switch.
[+] jerf|9 years ago|reply
Possibly in the past, yes. But if they're now just paying 1ms in GC every so often, the advantage is now gone. Go is generally faster than Erlang (in Go-native vs. Erlang-native code) so the system is quite possibly net outperforming what Erlang can do now. 1ms is just noise when packet latency jitter is higher than that.
[+] 010a|9 years ago|reply
I would emphasize that, in many of those examples, Go wasn't a very viable choice when the apps were originally written. Twitch chose Go back at ~1.2 (2013), when Erlang might have made more sense.

Today, for companies making a similar decision now, that argument is a bit different. Go 1.6/1.7 obviously has massive improvements in the areas the article outlines. But, in Erlang camp, we have Elixir making that more enticing.

I would argue Twitch made the right choice. They will have a magnitude easier time finding devs to support a Go system over an Erlang system. And their product never suffered for it. And they are clearly a force behind making Go better, which has helped more people than just them.

[+] weberc2|9 years ago|reply
There are likely other tradeoffs. This (GC pause times) is probably not the only criterium, nor even the most important. It's really hard to draw a conclusion based on such limited information.
[+] Thaxll|9 years ago|reply
Go 1.6 GC is probably faster thant Erlang GC now.
[+] rogerdpack|9 years ago|reply
According to the article they chose it because of "Its simplicity, safety, performance, and readability" perhaps it has more in some of those than Erlang does/did... ?
[+] gnuvince|9 years ago|reply
Isn't Facebook's chat also powered by Erlang?
[+] smegel|9 years ago|reply
I'm glad they did. Sounds like they have helped push the development of Go along which is good for everyone.
[+] iamleppert|9 years ago|reply
I'm curious why you didn't just use something like Redis for managing concurrent state and pair that to any of the various web apps that are good at concurrent connections? You could still use Go to serve the web requests/sockets/etc.
[+] smegel|9 years ago|reply
> But this isn’t another article about how great Go is for us — it’s about how our use of Go pushes the limits of the current runtime implementation in some dimensions, and how we respond to reaching those limits.
[+] lllorddino|9 years ago|reply
Before Go I was web developing in Node.js but wanted to get "closer to the metal." Thought about using C for the back end but then heard about Go and have been in love ever since. My favorite programming language by far.
[+] jeffdavis|9 years ago|reply
There has been a ton of research for GC on the JVM. What are the differences between Go's approach and Java's? Are those differences due to linguistic differences or different goals?
[+] cbsmith|9 years ago|reply
There was a ton of research of LISP & Smalltalk GC's prior to the advent of the JVM.
[+] amelius|9 years ago|reply
How do they prove correctness of their GC?
[+] coldtea|9 years ago|reply
They run lots of programs and see if they crash/leak.

(Seriously, Go is not really an academic language caring for that kind of stuff, unless it comes off easily to prove it).

[+] mkevac|9 years ago|reply
> Next up is runtime.scanobject. That function does several things, but the reason it’s running during the chat server’s mark termination phase in Go 1.5 is to implement finalizers.

How did you know that?

> We emailed back and forth and the Go team was very helpful with suggestions on how to diagnose the performance problems and how to distill them into minimal test cases.

Can you give us the link?

[+] _pmf_|9 years ago|reply
Purposefully strolling to where the puck was in 2001.