I've been seeing a lot of Python vs Go stuff lately and I think a fair amount of the folks involved in these are not aware of general Python web architecture patterns.
Of course something compiled directly is going to be a bit faster, but development time is important too. Python has more libraries and is (for many people) probably faster to write.
Serving multiple requests is best utilized using a preforking webserver in front of Python, whether Apache, nginx, etc. This allows multiple requests in without any async voodoo code. Twisted for example is not the right answer in this case, because it doesn't get you multiple processes and messes up the way you write code (async event driven code is more time consuming to write/debug).
On the backend, your webserver does not start longrunning backend processes, but you can launch them using things like celery, which is a process manager that allows you to start jobs and so forth. Celery can run on any number of machines, and your backend can scale independently of your frontend if you wish.
Historically, some very computational parts of Python were often written with C bindings. While I haven't done so, things like Cython may also be promising for extensions. There's also things like ctypes for quickly just taking advantage of native libraries in a Python function.
Personally, given, I like how Go has things like channels, but I would never adopt a programming language for just one specific feature when I lose out on other features that are valuable to me, for instance, an object model.
(I'm also really curious to see how the typing options in Python 3 play out)
Anyway, I mostly wanted to point out as most people are doing web services that you should be fronting Python with some sort of web server that allows preforking, and then the concurrency issue, in my experience, becomes not a thing.
Many backend libraries can easily take advantage of libs like microprocessing, which are not the most 100% friendly in their more complex IPC-type cases, but are pretty workable.
I've been seeing a lot of Python vs Go stuff lately and I think a fair amount of the folks involved in these are not aware of general Python web architecture patterns.
I absolutely agree, but I also think that deploying python on the web still has too much of a learning curve. Even the standard nginx > gunicorn > wsgi model is kind of a pain. Couple that with celery, and init systems, and you're basically down a sysadmin rabbit hole.
> Anyway, I mostly wanted to point out as most people are doing web services that you should be fronting Python with some sort of web server that allows preforking, and then the concurrency issue, in my experience, becomes not a thing.
Spot on. Concurrency vs parallelism, and clean distinction of responsibility (web server vs backend threads).
> Many backend libraries can easily take advantage of libs like microprocessing, which are not the most 100% friendly in their more complex IPC-type cases, but are pretty workable.
This is painful. In Javascript I can be careless (well to a great extend for people like me likes magic) using promises. Python can achieve this too but with a great effort of learning either coroutines, gevents, or asyncio. Though I have to admit that Javascript has its own problem facing parallelism.
I have done things with gevents, spawning greenlets and respond to user immediately. The thing is, backend should always be stateless, so worker models like celery and rabbitmq pub/sub and etc are more popular.
> Personally, given, I like how Go has things like channels, but I would never adopt a programming language for just one specific feature when I lose out on other features that are valuable to me, for instance, an object model.
That really depends on your requirements. If you need multithreading (not multiprocessing), you cannot use Python.
> Personally, given, I like how Go has things like channels, but I would never adopt a programming language for just one specific feature when I lose out on other features that are valuable to me, for instance, an object model.
These are great ideas. I told Valentin to drop by. Because DAS is an aggregator with an expert-system style query language, there is sharing among the Python services for caching. It's the caching that makes async code a good option. Preforking might work against this without significant increase in complexity in order to communicate with a single local cache. Remember that this is a web service for the Large Hadron Collider. Nothing about that is small.
Basically their Python version ("3 thread pools, 175 threads") is synchronous and single (OS)-threaded, while the Go rewrite uses goroutines and multiple OS threads. The fact that their Python version takes "minutes to startup" indicates that a rewrite was necessary anyways.
Go is a good tool for the job, Python threads are not. asyncio or one of the event-based IO frameworks should work much better.
As for the problem of sharing data between processes (slide 5): it appears that this service is read only? If that's true, what do you need to share? Every process can have it's own connection pool. You don't even need multiprocessing, just use SO_REUSEPORT and start your application multiple times.
You could probably get decent performance for a similar application written in another language (then Python) using 175 threads. 175 threads is not that big of deal, the OS can manage it pretty well. It's only when you start talking about thousands of individual connections and thousands of threads that you need to worry. Python sucks at that at low number of threads (GIL).
Typically rewrites like this focus on core functionality; I truly down the project is a 1:1 equivalent. There may be factorings, as well (functionality included as part of Go).
Not really. It's just a report written by someone who has an existing code infrastructure and is experimenting with alternative approaches so wrote some basic scripts for benchmarking.
I haven't done any real work in Go yet but this sounds like one of the (many) use cases it's well suited to.
Unfortunately this overview is light on any meaningful details. As a general rule a rewrite of any project will result in fewer lines of code, however, in general, a rewrite of any project is a terrible idea.
Given that this seems to be a situation in which you have a lot of blocking waiting for concurrent requests, why not try something like gevent?
It's good for people to try different approaches and technologies. I'm glad they managed to have success with Go, that's good for everyone. It would have been interesting for the reader to see some of the gory details of hacking around with the existing codebase to see some of the ideas that may (not) have worked.
It's a Go rewrite of an existing, and probably old, Python application. You are asking them, who already did a rewrite in Go and kindly provided their assessment of the process, to also to a Python rewrite using more modern approaches.
Feel free to rewrite their old Python app in Python for free. They may thank you and even use your port.
I think they were just comparing their existing Python deployment infrastructure to a generic Go set up (there are also ways to optimise Go that wasn't explored in that article). It wasn't meant as a "look how much Python sucks compared to Go" type article like a few seem to have taken it. More just disclosing the results of some internal testing they've been doing.
On that note, I would suggest that if you think they could see big gains with little code refactoring simply by switching Python frameworks or even to a different Python runtime, then maybe you should contact them. I'm sure the author would be open to ways to increase their throughput with less developer overhead.
Do you have an example in Python of doing fan-out/fan-in? I've done it in Go before, and didn't find it particularly nice to look at (although it did work, and worked well).... so I'm curious what a Python example would look like.
I'm not saying you shouldn't use dynamic languages at all (in fact, I'm developing in one right now), but you should keep in mind that you are paying a computational price for that dynamism every time a line of your code is executed.
Can anyone comment on what the CMS DAS web service is? I'm having a hard time understanding what it is supposed to do. I'm sure the audience knew or maybe it's obvious and I'm just missing something.
It's essentially a way to look up meta data about the different data files produced by the CMS detector. There are petabytes of data produced by the detector and these are stored in countless data file. In order to determine which datasets are available and right for your particular analysis you would use the DAS system to look for them and find out where they're located. This is a complicated task b/c the petabytes of date are distributed across the CMS computing grid that spans many dozens of institutes across the globe.
[+] [-] mpdehaan2|10 years ago|reply
Of course something compiled directly is going to be a bit faster, but development time is important too. Python has more libraries and is (for many people) probably faster to write.
Serving multiple requests is best utilized using a preforking webserver in front of Python, whether Apache, nginx, etc. This allows multiple requests in without any async voodoo code. Twisted for example is not the right answer in this case, because it doesn't get you multiple processes and messes up the way you write code (async event driven code is more time consuming to write/debug).
On the backend, your webserver does not start longrunning backend processes, but you can launch them using things like celery, which is a process manager that allows you to start jobs and so forth. Celery can run on any number of machines, and your backend can scale independently of your frontend if you wish.
Historically, some very computational parts of Python were often written with C bindings. While I haven't done so, things like Cython may also be promising for extensions. There's also things like ctypes for quickly just taking advantage of native libraries in a Python function.
Personally, given, I like how Go has things like channels, but I would never adopt a programming language for just one specific feature when I lose out on other features that are valuable to me, for instance, an object model.
(I'm also really curious to see how the typing options in Python 3 play out)
Anyway, I mostly wanted to point out as most people are doing web services that you should be fronting Python with some sort of web server that allows preforking, and then the concurrency issue, in my experience, becomes not a thing.
Many backend libraries can easily take advantage of libs like microprocessing, which are not the most 100% friendly in their more complex IPC-type cases, but are pretty workable.
[+] [-] dec0dedab0de|10 years ago|reply
I absolutely agree, but I also think that deploying python on the web still has too much of a learning curve. Even the standard nginx > gunicorn > wsgi model is kind of a pain. Couple that with celery, and init systems, and you're basically down a sysadmin rabbit hole.
[+] [-] yeukhon|10 years ago|reply
Spot on. Concurrency vs parallelism, and clean distinction of responsibility (web server vs backend threads).
> Many backend libraries can easily take advantage of libs like microprocessing, which are not the most 100% friendly in their more complex IPC-type cases, but are pretty workable.
This is painful. In Javascript I can be careless (well to a great extend for people like me likes magic) using promises. Python can achieve this too but with a great effort of learning either coroutines, gevents, or asyncio. Though I have to admit that Javascript has its own problem facing parallelism.
I have done things with gevents, spawning greenlets and respond to user immediately. The thing is, backend should always be stateless, so worker models like celery and rabbitmq pub/sub and etc are more popular.
[+] [-] whyever|10 years ago|reply
That really depends on your requirements. If you need multithreading (not multiprocessing), you cannot use Python.
[+] [-] mratzloff|10 years ago|reply
Go has an object model.
[+] [-] adolgert|10 years ago|reply
[+] [-] andor|10 years ago|reply
Go is a good tool for the job, Python threads are not. asyncio or one of the event-based IO frameworks should work much better.
As for the problem of sharing data between processes (slide 5): it appears that this service is read only? If that's true, what do you need to share? Every process can have it's own connection pool. You don't even need multiprocessing, just use SO_REUSEPORT and start your application multiple times.
[+] [-] mtanski|10 years ago|reply
[+] [-] mherrmann|10 years ago|reply
[+] [-] dekhn|10 years ago|reply
[+] [-] rbanffy|10 years ago|reply
[+] [-] FraaJad|10 years ago|reply
Best opensource the code for both and the benchmarks and have people go at it.
[+] [-] laumars|10 years ago|reply
[+] [-] aidos|10 years ago|reply
Unfortunately this overview is light on any meaningful details. As a general rule a rewrite of any project will result in fewer lines of code, however, in general, a rewrite of any project is a terrible idea.
Given that this seems to be a situation in which you have a lot of blocking waiting for concurrent requests, why not try something like gevent?
It's good for people to try different approaches and technologies. I'm glad they managed to have success with Go, that's good for everyone. It would have been interesting for the reader to see some of the gory details of hacking around with the existing codebase to see some of the ideas that may (not) have worked.
[+] [-] alexchamberlain|10 years ago|reply
[+] [-] rbanffy|10 years ago|reply
Feel free to rewrite their old Python app in Python for free. They may thank you and even use your port.
[+] [-] laumars|10 years ago|reply
On that note, I would suggest that if you think they could see big gains with little code refactoring simply by switching Python frameworks or even to a different Python runtime, then maybe you should contact them. I'm sure the author would be open to ways to increase their throughput with less developer overhead.
[+] [-] tobz|10 years ago|reply
[+] [-] kozak|10 years ago|reply
[+] [-] collyw|10 years ago|reply
[+] [-] iamd3vil|10 years ago|reply
[+] [-] brokentone|10 years ago|reply
[+] [-] mbreese|10 years ago|reply
[+] [-] Analog24|10 years ago|reply
[+] [-] andrioni|10 years ago|reply
[+] [-] cptwunderlich|10 years ago|reply
[+] [-] esseti|10 years ago|reply
[+] [-] mhd|10 years ago|reply
[+] [-] jonathan_s|10 years ago|reply
[+] [-] SjuulJanssen|10 years ago|reply