top | item 10412270

Comparing a web service written in Python and Go [pdf]

85 points| guai898 | 10 years ago |indico.cern.ch | reply

75 comments

order
[+] mpdehaan2|10 years ago|reply
I've been seeing a lot of Python vs Go stuff lately and I think a fair amount of the folks involved in these are not aware of general Python web architecture patterns.

Of course something compiled directly is going to be a bit faster, but development time is important too. Python has more libraries and is (for many people) probably faster to write.

Serving multiple requests is best utilized using a preforking webserver in front of Python, whether Apache, nginx, etc. This allows multiple requests in without any async voodoo code. Twisted for example is not the right answer in this case, because it doesn't get you multiple processes and messes up the way you write code (async event driven code is more time consuming to write/debug).

On the backend, your webserver does not start longrunning backend processes, but you can launch them using things like celery, which is a process manager that allows you to start jobs and so forth. Celery can run on any number of machines, and your backend can scale independently of your frontend if you wish.

Historically, some very computational parts of Python were often written with C bindings. While I haven't done so, things like Cython may also be promising for extensions. There's also things like ctypes for quickly just taking advantage of native libraries in a Python function.

Personally, given, I like how Go has things like channels, but I would never adopt a programming language for just one specific feature when I lose out on other features that are valuable to me, for instance, an object model.

(I'm also really curious to see how the typing options in Python 3 play out)

Anyway, I mostly wanted to point out as most people are doing web services that you should be fronting Python with some sort of web server that allows preforking, and then the concurrency issue, in my experience, becomes not a thing.

Many backend libraries can easily take advantage of libs like microprocessing, which are not the most 100% friendly in their more complex IPC-type cases, but are pretty workable.

[+] dec0dedab0de|10 years ago|reply
I've been seeing a lot of Python vs Go stuff lately and I think a fair amount of the folks involved in these are not aware of general Python web architecture patterns.

I absolutely agree, but I also think that deploying python on the web still has too much of a learning curve. Even the standard nginx > gunicorn > wsgi model is kind of a pain. Couple that with celery, and init systems, and you're basically down a sysadmin rabbit hole.

[+] yeukhon|10 years ago|reply
> Anyway, I mostly wanted to point out as most people are doing web services that you should be fronting Python with some sort of web server that allows preforking, and then the concurrency issue, in my experience, becomes not a thing.

Spot on. Concurrency vs parallelism, and clean distinction of responsibility (web server vs backend threads).

> Many backend libraries can easily take advantage of libs like microprocessing, which are not the most 100% friendly in their more complex IPC-type cases, but are pretty workable.

This is painful. In Javascript I can be careless (well to a great extend for people like me likes magic) using promises. Python can achieve this too but with a great effort of learning either coroutines, gevents, or asyncio. Though I have to admit that Javascript has its own problem facing parallelism.

I have done things with gevents, spawning greenlets and respond to user immediately. The thing is, backend should always be stateless, so worker models like celery and rabbitmq pub/sub and etc are more popular.

[+] whyever|10 years ago|reply
> Personally, given, I like how Go has things like channels, but I would never adopt a programming language for just one specific feature when I lose out on other features that are valuable to me, for instance, an object model.

That really depends on your requirements. If you need multithreading (not multiprocessing), you cannot use Python.

[+] mratzloff|10 years ago|reply
> Personally, given, I like how Go has things like channels, but I would never adopt a programming language for just one specific feature when I lose out on other features that are valuable to me, for instance, an object model.

Go has an object model.

[+] adolgert|10 years ago|reply
These are great ideas. I told Valentin to drop by. Because DAS is an aggregator with an expert-system style query language, there is sharing among the Python services for caching. It's the caching that makes async code a good option. Preforking might work against this without significant increase in complexity in order to communicate with a single local cache. Remember that this is a web service for the Large Hadron Collider. Nothing about that is small.
[+] andor|10 years ago|reply
Basically their Python version ("3 thread pools, 175 threads") is synchronous and single (OS)-threaded, while the Go rewrite uses goroutines and multiple OS threads. The fact that their Python version takes "minutes to startup" indicates that a rewrite was necessary anyways.

Go is a good tool for the job, Python threads are not. asyncio or one of the event-based IO frameworks should work much better.

As for the problem of sharing data between processes (slide 5): it appears that this service is read only? If that's true, what do you need to share? Every process can have it's own connection pool. You don't even need multiprocessing, just use SO_REUSEPORT and start your application multiple times.

[+] mtanski|10 years ago|reply
You could probably get decent performance for a similar application written in another language (then Python) using 175 threads. 175 threads is not that big of deal, the OS can manage it pretty well. It's only when you start talking about thousands of individual connections and thousands of threads that you need to worry. Python sucks at that at low number of threads (GIL).
[+] mherrmann|10 years ago|reply
Anybody else find it difficult to believe that a 4k LOC Go project takes 26k LOC in Python?
[+] dekhn|10 years ago|reply
Typically rewrites like this focus on core functionality; I truly down the project is a 1:1 equivalent. There may be factorings, as well (functionality included as part of Go).
[+] rbanffy|10 years ago|reply
It seems it's looking at everything outside the core libraries. Go has a built-in templating engine. That alone may explain the LoC difference.
[+] FraaJad|10 years ago|reply
This looks like a report written by someone who is trying to show how their $favorite system is better than the $other one.

Best opensource the code for both and the benchmarks and have people go at it.

[+] laumars|10 years ago|reply
Not really. It's just a report written by someone who has an existing code infrastructure and is experimenting with alternative approaches so wrote some basic scripts for benchmarking.
[+] aidos|10 years ago|reply
I haven't done any real work in Go yet but this sounds like one of the (many) use cases it's well suited to.

Unfortunately this overview is light on any meaningful details. As a general rule a rewrite of any project will result in fewer lines of code, however, in general, a rewrite of any project is a terrible idea.

Given that this seems to be a situation in which you have a lot of blocking waiting for concurrent requests, why not try something like gevent?

It's good for people to try different approaches and technologies. I'm glad they managed to have success with Go, that's good for everyone. It would have been interesting for the reader to see some of the gory details of hacking around with the existing codebase to see some of the ideas that may (not) have worked.

[+] alexchamberlain|10 years ago|reply
Yet another Go article not fairly comparing technologies. What about a Python implementation that used `asyncio`, for example? What about `PyPy`?
[+] rbanffy|10 years ago|reply
It's a Go rewrite of an existing, and probably old, Python application. You are asking them, who already did a rewrite in Go and kindly provided their assessment of the process, to also to a Python rewrite using more modern approaches.

Feel free to rewrite their old Python app in Python for free. They may thank you and even use your port.

[+] laumars|10 years ago|reply
I think they were just comparing their existing Python deployment infrastructure to a generic Go set up (there are also ways to optimise Go that wasn't explored in that article). It wasn't meant as a "look how much Python sucks compared to Go" type article like a few seem to have taken it. More just disclosing the results of some internal testing they've been doing.

On that note, I would suggest that if you think they could see big gains with little code refactoring simply by switching Python frameworks or even to a different Python runtime, then maybe you should contact them. I'm sure the author would be open to ways to increase their throughput with less developer overhead.

[+] tobz|10 years ago|reply
Do you have an example in Python of doing fan-out/fan-in? I've done it in Go before, and didn't find it particularly nice to look at (although it did work, and worked well).... so I'm curious what a Python example would look like.
[+] kozak|10 years ago|reply
I'm not saying you shouldn't use dynamic languages at all (in fact, I'm developing in one right now), but you should keep in mind that you are paying a computational price for that dynamism every time a line of your code is executed.
[+] collyw|10 years ago|reply
And you are paying for developer time otherwise.
[+] iamd3vil|10 years ago|reply
Anyone who thinks it's difficult to program in Erlang, please have a look at Elixir(https://elixir-lang.org). It's quite nice to work with.
[+] brokentone|10 years ago|reply
This does not seem relevant to Go or Python.
[+] mbreese|10 years ago|reply
Can anyone comment on what the CMS DAS web service is? I'm having a hard time understanding what it is supposed to do. I'm sure the audience knew or maybe it's obvious and I'm just missing something.
[+] Analog24|10 years ago|reply
It's essentially a way to look up meta data about the different data files produced by the CMS detector. There are petabytes of data produced by the detector and these are stored in countless data file. In order to determine which datasets are available and right for your particular analysis you would use the DAS system to look for them and find out where they're located. This is a complicated task b/c the petabytes of date are distributed across the CMS computing grid that spans many dozens of institutes across the globe.
[+] cptwunderlich|10 years ago|reply
Look at the scales for the graphs on page 9. What a ridiculous comparison...
[+] esseti|10 years ago|reply
are the conclusion true in general? I mean, sw written in go performs better than the one written in python
[+] mhd|10 years ago|reply
Software rewritten in Python often performs better than the original in Python, too.
[+] jonathan_s|10 years ago|reply
Sure it's true. And software written in C or assembly language often also performs better than those written in Python.
[+] SjuulJanssen|10 years ago|reply
I think a more true comparison would be if the author used a reactor/async based solution in his python code