top | item 22075066

Why we’re writing machine learning infrastructure in Go, not Python

130 points| calebkaiser | 6 years ago |towardsdatascience.com

74 comments

order
[+] tus88|6 years ago|reply
Sounds like fairly generic deployment infrastructure that has nothing to do with machine learning.

But why pass up the opportunity to use a buzzword to get on the front page of HN?

[+] RandyRanderson|6 years ago|reply
I count at least 3: ML, Go, Python. Challenge: could they fit more in there under the title len limit?
[+] calebkaiser|6 years ago|reply
This is a good point, and it’s something we think about a lot. In one sense, you can certainly think of Cortex as a tool for deploying/scaling/monitoring Python functions on AWS, but there are many ML-specific features that make it different.

It does things like prediction monitoring, and it supports models exported by ONNX and TF Serving. We also have designed Cortex to prioritize infrastructure needs specific to inference workloads (inference workloads are read only and memory hungry, for example). This is why we’ve prioritized things like GPU spot instances.

Our long-term plan includes more end-to-end ML workflows, including things like training, but for now we’re focused on getting model serving right.

[+] therealrootuser|6 years ago|reply
I think the real lesson here is to choose the language that works best for your team.

On my team we use Python and Scala. For network critical I/O stuff in Python, asyncio has worked out just fine for our needs. For massive CPU parallelism needs (at least in sporadic bursts), we've actually found that AWS/Lambda does pretty well.

Golang seems to be really polarizing. Most engineers on my team have tried Golang in the past, but haven't liked it, which is why we would never consider building anything on top of it. Everyone likes Python well-enough that it has kind of become the lingua franca for us.

Deployment is all based around containers or serverless/lambda, and we have a pretty standardized way of deploying these things by now. Just because a bunch of k8s tooling is written in Golang doesn't mean I need to rush out and write my stuff in Golang too.

[+] cutler|6 years ago|reply
I'm a bit green on infrastructure & deployment but I don't quite get this. If your ML algorithm code is still Python how does deployment with Go make that much difference? It sounds like you're not replacing the Python ML code so why is this such a big deal?
[+] cigaaa|6 years ago|reply
I don’t think people who work in infrastructure currently will be surprised that Go is a better choice than Python for infra, but for those who are newer to the field of ML or only work on model development (vs deployment), it is likely surprising that a major part of production ML is best done in a language other than Python.
[+] apta|6 years ago|reply
> It sounds like you're not replacing the Python ML code so why is this such a big deal?

How would you write an article about moving to a hipster language then? :-)

[+] tracker1|6 years ago|reply
I think, like TFA says, it comes down to ease of deployment for support tooling. A single executable is easier to distribute than a set of dependencies and a language runtime. These are tools that run outside containers to manage code that can run inside containers, where dependency management and isolation are easier. It makes total sense to me.
[+] sandGorgon|6 years ago|reply
I would do it in python using one of the fast, modern ASGI servers like uvicorn.

Zero downtime model updates can be done using a redis cache to persist models.

In any case, that's a solved problem using haproxy and kubernetes.

Not sure why go has these advantages

[+] hnaccy|6 years ago|reply
If you do model inference in web server process it will be compute bound and lock up the web server, is there a preferred/clean way to req/rec or similar pass the jobs to second process and allow web server process to non-blocking wait for response?
[+] rezeroed|6 years ago|reply
I would've chosen Erlang or Elixir for those reasons. Are we getting another Go package management solution this year? A pleasure to work with? I've been ditching Go for Nim recently. Other people seem to be enjoying Crystal. Rust is great, and coming down the road Zig looks excellent. I think Go has turned out to be a bit of a damp squib. Considering, unlike the other languages, it has Google behind it - unimpressed. After six years, I don't expect to be using it at all within the next year or two.
[+] luord|6 years ago|reply
Great, another one of these articles, but this time I feel more confident in my usual reply, having been working in Go exclusively for a while.

> Implementing all of this functionality in Python may be doable with recent tools like asyncio, but the fact that Go is designed with this use case in mind makes our lives much easier.

This just makes me think about Armin Ronacher's article on back pressure but, sure, whatever.

> Building a cross-platform CLI is easier in Go

No, it isn't.

> The performance benefits of a compiled Go binary versus an interpreted language are also significant

Ah, yes, because performance is such a key feature of command line interfaces, as evidenced by bash and its outstanding performance in every benchmark.

> The Go ecosystem is great for infrastructure projects

And the reality discussed in this point would be different if docker wasn't written in Go. Had the docker developers chose anything else, this point would apply to that hypothetical language, so it isn't an inherent advantage of Go as a language.

> Go is just a pleasure to work with

No, it really, really isn't, but that's not the point.

This is ultimately the real reason they chose go: whoever made the original decision liked it and everything else is post-hoc rationalization.

Which is fine, most of this tends to be subjective.

[+] Runawaytrain2|6 years ago|reply
All the stuff that requires speed is written in a language that compiles directly to machine code while the machine learning libraries are all python based. That seems standard, no?
[+] bitexploder|6 years ago|reply
I think that happens, but I don't know about standard. It is pretty obvious and natural. However, a lot of code is written in Python and it can be hard to move ML teams to use other tools. Many of them aren't great at programming because their background is stats/math so it can be hard to move critical code to more performant solutions without resistance from the teams.
[+] PeterisP|6 years ago|reply
All the popular machine learning libraries are a Python API to more conveniently communicate with the CUDA code that actually does the calculations on GPUs. The API can be in any other language, but the choice of that language does not really affect performance much, as the heavy lifting in any "Python machine learning library" does not happen in Python anyway.
[+] toolslive|6 years ago|reply
"in the land of the blind, the one-eyed man is king"

Golang is probably a step up from Python, but it's just that. There are a lot of issues with Golang. From the top of my head, lack of decent error handling (if err !=nil { return nil,err} ) or lack of decent polymorphism are the most annoying. There's a github repo dedicated to what's bugging people:

  https://github.com/ksimka/go-is-not-good
[+] vardump|6 years ago|reply
Well... there's also other side of the coin. There's value in visibility and lack of magic.

> lack of decent error handling (if err !=nil { return nil,err} )

Errors are in your face, instead of having exceptions performing invisible gotos to somewhere far up in call tree. Implicit error handling is more code, but your error handling is going to be much more robust.

> or lack of decent polymorphism...

Lack of polymorphism also means you don't have to guess about concrete types when reading code. When troubleshooting, you can see what's going on without going through whole inheritance tree.

Go encourages composition instead of inheritance. That's something I wish more C++ codebases would do as well. Composition makes code inherently more maintainable and easier to refactor.

Go tends to be easy to read and maintain. It does come with some cost. It's just a matter where your priorities lie. Software projects spend majority of their life as legacy, something that needs to be maintained.

[+] dewey|6 years ago|reply
These two issues are usually brought up by people who haven't written a lot of Go.

In day to day work it's really not an issue.

[+] icandoit|6 years ago|reply
I like my code to look simple and behave as expected. Stability and "one right way to do it" are great language features.

I think this was why Python was able to overtake Perl, for example.

[+] oflannabhra|6 years ago|reply
Why not Swift? (I think I know the answer).

Concurrency in Swift is not yet a solved problem, but libdispatch is quite workable (although not "elegant, out of the box" per the article).

With the work being done in Swift for TensorFlow [0], I'd imagine in a year or two both the infrastructure and the ML portions of a product like Cortex could be written in a single language.

[0] - https://www.tensorflow.org/swift

[+] pjmlp|6 years ago|reply
Until Swift stops requiring stuff like import Glibc for basic IO and actually supports Windows, I would consider it an Apple only language.
[+] calebkaiser|6 years ago|reply
Swift is an interesting choice, one we haven't explored in depth. Out of curiosity, have you done any work with Swift for Tensorflow/what has your experience been?
[+] threeseed|6 years ago|reply
Tensorflow which is for Deep Learning is a very tiny part of the overall ML requirements.

To do large scale data preparation and engineering which necessitates clustering you would need to reinvent Spark. And then how about for the more common algorithms like boosted trees etc. Those algorithms are provided by Apple for iOS/MacOS but not sure if they exist for everyone else.

[+] kelsolaar|6 years ago|reply
Not reading carefully the title might make one think that you are doing ML with Go which by the content of the article you are obviously not. This is almost click-bait.
[+] nhumrich|6 years ago|reply
> Making all of these overlapping API calls in a performative, reliable way is a challenge.

Pythons asyncio is pretty hard to beat. For non-cpu intensive tasks, I find it a pleasure to work with. Goroutines can still have race conditions.

> Originally, we wrote the CLI in Python, but trying to distribute it across platforms proved to be too difficult

Sure, I get that go can cross-compile. But what makes python hard? Python works on every platform, and distributing is just a "pip install" and "pip install -u" Surely thats easier than "Download the correct binary for the platform, unzip it, change permissions, add it to your path, then do it all over again for every update"

I was the original author of the awseb cli and we found that pip install was significantly less of a hurdle than a go binary and decided to do it in Python instead. If a user on windows has a hard time installing python and pip, telling them to drop a binary and change their path isnt going to be any easier.

[+] aequitas|6 years ago|reply
> Python works on every platform, and distributing is just a "pip install" and "pip install -u"

Until you have a dependency which has a C dependency (like a crypto framework, SQL connector, etc). Suddenly you need an entire compiler toolchain, dev dependencies, all the library headers, and a decent amount of time. Also the errors thrown when these compile steps fail are anything but helpful for new users. If you are lucky there is already a wheel for your platform/arch.

> Surely thats easier than "Download the correct binary for the platform, unzip it, change permissions, add it to your path, then do it all over again for every update"

This is trivial to automate using a script and has a ton less failure modes to deal with than Pip would have (do you have the correct Python version?, is there a compiler installed for C modules?, etc).

[+] well_said|6 years ago|reply
"... is just a "pip install..."

I was yelled at a few times for not using package and environment management tools (Conda, etc). So, when working with Python it is not just a "pip install" anymore

[+] calebkaiser|6 years ago|reply
We thought similarly to you about the relative ease of "pip install"—which is why we originally wrote the CLI in Python. Pretty immediately, however, we heard back from users who experienced friction. With the Go binary, we're able to share a one line bash command that users on Mac/Linux (Windows coming soon) can run to install the Cortex CLI, which removed the need for us to instruct users on how to configure their local environments. We've found that this one line install works better for our users: https://www.cortex.dev/install

Also, when we had the Python CLI, some of our users complained that in their CI systems which ran `cortex deploy`, they didn’t need Python/Pip in their images, and installing them was inconvenient

[+] mongol|6 years ago|reply
Not a fair comparison. Go has go install.
[+] flavio81|6 years ago|reply
Anything is faster than CPython. Even PHP!