(no title)
phodge | 5 years ago
When you're dealing with external REST APIs that take multiple seconds to respond, then the async version is substantially "faster" because your process can get some other useful work done while it's waiting. Obviously the async framework introduces some overhead, but that bit of overhead is probably a lot less than the 3 billion cpu cycles you'll waste waiting 1000ms for an external service.
calpaterson|5 years ago
As I describe in the first line of my article I don't think that people who think async is faster have unreasonable expectations. It seems very intuitive to assume that greater concurrency would mean greater performance - at least one some measure.
> When you're dealing with external REST APIs that take multiple seconds to respond, then the async version is substantially "faster" because your process can get some other useful work done while it's waiting.
I'm afraid I also don't think you have this right conceptually. An async implementation that does multiple ("embarrassingly parallel") tasks in the same process - whether that is DB IO waiting or microservice IO waiting - is not necessarily a performance improvement over a sync version that just starts more workers and has the OS kernel scheduler organise things. In fact in practice an async version is normally lower throughput, higher latency and more fragile. This is really what I'm getting at when I say async is not faster.
Fundamentally, you do not waste "3 billion cpu cycles" waiting 1000ms for an external service. Making alternative use of the otherwise idle CPU is the purpose (and IMO the proper domain of) operating systems.
john-radio|5 years ago
Sure, the operating system can find other things to do with the CPU cycles when a program is IO-locked, but that doesn't help the program that you're in the situation of currently trying to run.
> An async implementation that does multiple ("embarrassingly parallel") tasks in the same process - whether that is DB IO waiting or microservice IO waiting - is not necessarily a performance improvement over a sync version that just starts more workers and has the OS kernel scheduler organise things. In fact in practice an async version is normally lower throughput, higher latency and more fragile. This is really what I'm getting at when I say async is not faster.
You're right. "Arbitrary programs will run faster" is not the promise of Python async.
Python async does help a program work faster in the situation that phodge just described (waiting for web requests, or waiting for a slow hardware device), since the program can do other things while waiting for the locked IO (unlike a Python program that does not use async and could only proceed linearly through its instructions). That's the problem that Python asyncio purports to solve. It is still subject to the Global Interpreter Lock, meaning it's still bound to one thread. (Python's multiprocessing library is needed to overcome the GIL and break out a program into multiple threads, at the cost that cross-thread communication now becomes expensive).
brightball|5 years ago
On the web when the bulk of your application code time is waiting on APIs, database queries, external caches or disk I/O it creates a dramatic increase in the capacity of your server if you can do it with minimal RAM overhead.
It's one of the big reasons I've always wanted to see Techempower create a test version that continues to increase concurrency beyond 512 (as high as maybe 10k). I think it would be interesting.
arghwhat|5 years ago
This is very true, especially when actual work is involved.
Remember, the kernel uses the exact same mechanism to have a process wait on a synchronous read/write, as it does for a processes issuing epoll_wait. Furthermore, isolating tasks into their own processes (or, sigh, threads), allows the kernel scheduler to make much better decisions, such as scheduling fairness and QoS to keep the system responsive under load surges.
Now, async might be more efficient if you serve extreme numbers of concurrent requests from a single thread if your request processing is so simple that the scheduling cost becomes a significant portion of the processing time.
... but if your request processing happens in Python, that's not the case. Your own scheduler implementation (your event loop) will likely also end up eating some resources (remember, you're not bypassing anything, just duplicating functionality), and is very unlikely to be as smart or as fair as that of the kernel. It's probably also entirely unable to do parallel processing.
And this is all before we get into the details of how you easily end up fighting against the scheduler...
wongarsu|5 years ago
yamrzou|5 years ago
Co-routines are not necessarily faster than threads, but they yield to a performance improvement if one has to spin thousands of them : they have less creation overhead and consume less RAM.
_pmf_|5 years ago
Literally the first thing any concurrency course starts with in the very first lesson is that scheduling and context overhead are not negligible. Is it so hard to expect our professionals to know basic principles of what they are dealing with?
dspillett|5 years ago
This is because when they are first shown it, the examples are faster, effectively at least, because the get given jobs done in less wallclock time due to reduced blocking.
They learn that but often don't get told (or work out themselves) that in many cases the difference is so small as to be unmeasurable or in other circumstances the can be negative effects (overheads others have already mentioned in the framework, more things waiting on RAM with a part processed working day which could lead to thrashing in a low memory situation, greater concurrent load on other services such as a database and the IO system it depends upon, etc).
As a slightly of-the-topic-of-async example, back when multi-core processing was first becoming cheap enough that it was not just affordable at give but the default option, I had great trouble trying to explain to a colleague why two IO intensive database processes he was running were so much slower than when I'd shown him the same process (I'd run them sequentially). He was absolutely fixated on the idea that his four cores should make concurrency the faster option, I couldn't get through that in this case the flapping heads on the drives of the time were the bottleneck and the CPU would be practically idle no matter how many cores it had while the bottleneck was elsewhere.
Some people learn the simple message (async can handle some loads much more efficiently) as an absolute (async is more efficient) and don't consider at all that the situation may be far more nuanced.
nurettin|5 years ago
You mean concurrent tasks in the same process?
lucideer|5 years ago
I do.
And I don't think I'm alone nor being unreasonable.
kerkeslager|5 years ago
This is a quintessential example of not seeing the forest for the trees.
The point of coroutines is absolutely to make my code execute faster. If a completely I/O-bound application sits idle while it waits for I/O, I don't care and I should not care because there's no business value in using those wasted cycles. The only case where coroutines are relevant is when the application isn't completely I/O bound; the only case where coroutines are relevant is when they make your code execute faster.
It's been well-known for a long time that the majority of processes in (for example) a webserver, are I/O bound, but there are enough exceptions to that rule that we need a solution to situations where the process is bound by something else, i.e. CPU. The classic solution to this problem is to send off CPU-bound processes to a worker over a message queue, but that involves significant overhead. So if we assume that there's no downside to making everything asynchronous, then it makes sense to do that--it's not faster for the I/O bound cases, but it's not slower either, and in the minority-but-not-rare CPU-bound case, it gets us a big performance boost.
What this test is doing is challenging the assumption that there's no downside to making everything asynchronous.
In context, I tend to agree with the conclusion that there are downsides. However, those downsides certainly don't apply to every project, and when they do, there may be a way around them. The only lesson we can draw from this is that gaining benefit from coroutines isn't guaranteed or trivial, but there is much more compelling evidence for that out there.
michaelcampbell|5 years ago
I think rather the point is to make your APPLICATION either finish in less time, or to not take MORE time when given more load.
The code runs as fast as it runs, coroutines notwithstanding.
BiteCode_dev|5 years ago
The surprising conclusion of the article is that on a realistic scenario, the async web frameworks will ouput less requests/sec than the sync ones.
I'm very familiar with Python concurrency paradigms, and I wasn't expecting that at all.
Add to that zzzeek's article (the guy wrote SQLA...) stating async is also slower for db access, this makes async less and less appealing, given the additional complexity it adds.
Now appart from doing a crawler, or needing to support websockets, I find hard to justify asyncio. In fact, with David Beasley hinting that you probably can get away with spawning a 1000 threads, it raises more doubts.
The whole point of async was that, at least when dealing with a lot of concurrent I/O, it would be a win compared to threads+multiprocessing. If just by cranking the number of sync workers you get better results for less complexity, this is bad.
DougBTX|5 years ago
BiteCode_dev|5 years ago
The author of "black" suggested that the cause of the slow down may be that asyncio actually starved postgres for resources:
https://twitter.com/llanga/status/1271719783080366086
zzzeek|5 years ago
but threads get you the same thing with much less overhead. this is what benchmarks like this one and my own continue to confirm.
People often are afraid of threads in Python because "the GIL!" But the GIL does not block on IO. I think programmers reflexively reaching for Tornado or whatever don't really understand the details of how this all works.
danbruc|5 years ago
That is not true, at least not in general, the whole point of using continuations for async I/O is to avoid the overhead of using threads, the scheduler overhead, the cost of saving and restoring the processor state when switching tasks, the per thread stack space, and so on.
aviba|5 years ago
mavdi|5 years ago
didibus|5 years ago
The use of async either as callbacks, or user threads, or coroutines, is a convenience layer for structuring your code. As I understand, that layer does add some overhead, because it captures an environment, and has to later restore it.
hinkley|5 years ago
Async and parallel always use more CPU cycles than sequential. There is no question. He real questions are: do you have cycles to burn, will doing so brings the wall clock time down, and is it worth the complexity of doing so?
Izkata|5 years ago
I was thinking this would be about using multiprocessing to fire off two or more background tasks, then handle the results together once they all completed. If the background tasks had a large enough duration, then yeah, doing them in parallel would overcome the overhead of creating the processes and the overall time would be reduced (it would be "faster"). I thought this post would be a "measure everything!" one, after they realized for their workload they didn't overcome that overhead and async wasn't faster.
Upon what the post was about, my response was more like "...duh".
danbruc|5 years ago
Waiting for I/O does usually not waste any CPU cycles, the thread is not spinning in a loop waiting for a response, the operating system will just not schedule the thread until the I/O request completed.
toxik|5 years ago
You are making dinner. You start to boil water for the potatoes. While that happens, you prepare the beef. Async.
You and your girlfriend are making dinner. You do the potatoes, she does the beef. Parallel.
You can perhaps see how you could have asynchronous and parallel execution at the same time.
In the context of a Web server, a request is handled by a single Python process (so don’t give me that “OS scheduler can do other things”). Async matters here because your request turnover can be higher, even if the requests/sec remains the same.
In the cooking example, each request gets a single cook. If that cook is able to do things asynchronously, he will finish a single meal faster.
If it were only parallel, you could have more cooks - because they would be less demanding - but they would each be slower.
maxmalysh|5 years ago
rumanator|5 years ago
It depends on what you mean by "faster". HTTP requests are IO bound, thus it is to be expected that the throughout of a IO bound service benefits from a technology that prevents your process from sitting idle while waiting for IO.
Thus it's surprising that Python's async code performs worse, not better, in both throughput and latency.
> When you're dealing with external REST APIs that take multiple seconds to respond, then the async version is substantially "faster"
The findings reported in the blog post you're commenting are the exact opposite of your claim: Python's async performs worse than it's sync counterpart.
ashtonkem|5 years ago
“Faster” is misleading because the speed improvements that you get with async is very dependent on load. At low levels there is going to typically be negligible or no speed gains, but at higher levels the benefit will be incredibly obvious.
The one caveat to this is cases where async allows you to run two requests in parallel, rather than sequentially. I would argue that this is less about async than it is about concurrency, and how async work can make some concurrent work loads more ergonomic to program.
zzzeek|5 years ago
> “Faster” is misleading
and
> "At low levels there is going to typically be negligible or no speed gains, but at higher levels the benefit will be incredibly obvious."
there are no "speed" gains period. the same amount of work will be accomplished in the same amount of time with threads or async. async makes it more memory efficient to have a huge number of clients waiting concurrently for results on slow services, but all of those clients walking off with their data will not be reached "faster" than with threads.
the reason that asyncio advocates say that asyncio is "faster" is based on the notion that the OS thread scheduler is slow, and that async context switches are some combination of less frequent and more efficient such that async is faster. This may be the case for other languages but for Python's async implementations it is not the case, and benchmarks continue to show this.
vertex-four|5 years ago
delusional|5 years ago
You are not waiting for that 1000ms, and you haven't been for 35 years since the first os's starting feature preemptive multitasking.
When you wait on a socket, the OS will remove you from the CPU and place someone who is not waiting. When data is ready, you are placed back. You aren't wasting the CPU cycles waiting, only the ones the OS needs to save your state.
Actually standing there and waiting on the socket is not a thing people have done for a long time.
pdpi|5 years ago
The point is that async IO allows your own process/thread to progress while waiting for IO. Preemptive multitasking just assigns the CPU to something else while waiting, which is good for the box as a whole, but not necessarily productive for that one process (unless it is multithreaded).
unknown|5 years ago
[deleted]
dilandau|5 years ago
...with the goal of making your application faster.
arghwhat|5 years ago
In doing that, you're removing natural parallelism, and end up competing with the kernel scheduler, both in performance and in scheduling decisions.
crimsonalucard1|5 years ago
Then it was people saying “Guys, stop buying surgical masks, The science says they don’t work it’s like putting a rag over your mouth.”
All of these so called expert know it alls were wrong and now we have another expert on asynchronous python telling us he knows better and he’s not surprised. No dude your just another guy on the internet pretending he’s a know it all.
If you are any good, you’ll realize that nodejs will beat the flask implementation any day of the week and the nodejs model is exactly identical to the python async model. Nodejs blew everything out of the water, and it showed that asynchronous single threaded code was better for exactly the test this benchmark is running.
It’s not obvious at all. Why is the node framework faster then python async? Why can’t python async beat python sync when node can do it easily? What is the specific flaw within python itself that is causing this? Don’t answer that question because you don’t actually know man. Just do what you always do and wait for a well intentioned humble person to run a benchmark then comment on it with your elitist know it all attitude claiming your not surprised.
Is there a word for these types of people? They are all over the internet. If we invent a label maybe they’ll start becoming self aware and start acting more down to earth.
ben509|5 years ago
Node's JIT comes from a web browser's javascript implementation used by billions of people. It's also had async baked in from day one.
Python started single process, added threading, and then bolted async on top of that. And CPython is a pretty straight interpreter.
A comparison between Node and PyPy would be more informative, but PyPy has a far less mature JIT and still has to deal with Python's dynamism.
> If we invent a label maybe they’ll start becoming self aware and start acting more down to earth.
You can't lecture people into self-awareness, any more than experts can lecture everyone into wearing masks.
catalogia|5 years ago
zaptheimpaler|5 years ago