WingNews

I think it is surprising to a lot of people who do take it as read that async will be faster.

As I describe in the first line of my article I don't think that people who think async is faster have unreasonable expectations. It seems very intuitive to assume that greater concurrency would mean greater performance - at least one some measure.

> When you're dealing with external REST APIs that take multiple seconds to respond, then the async version is substantially "faster" because your process can get some other useful work done while it's waiting.

I'm afraid I also don't think you have this right conceptually. An async implementation that does multiple ("embarrassingly parallel") tasks in the same process - whether that is DB IO waiting or microservice IO waiting - is not necessarily a performance improvement over a sync version that just starts more workers and has the OS kernel scheduler organise things. In fact in practice an async version is normally lower throughput, higher latency and more fragile. This is really what I'm getting at when I say async is not faster.

Fundamentally, you do not waste "3 billion cpu cycles" waiting 1000ms for an external service. Making alternative use of the otherwise idle CPU is the purpose (and IMO the proper domain of) operating systems.

john-radio|5 years ago

> Fundamentally, you do not waste "3 billion cpu cycles" waiting 1000ms for an external service. Making alternative use of the otherwise idle CPU is the purpose (and IMO the proper domain of) operating systems.

Sure, the operating system can find other things to do with the CPU cycles when a program is IO-locked, but that doesn't help the program that you're in the situation of currently trying to run.

> An async implementation that does multiple ("embarrassingly parallel") tasks in the same process - whether that is DB IO waiting or microservice IO waiting - is not necessarily a performance improvement over a sync version that just starts more workers and has the OS kernel scheduler organise things. In fact in practice an async version is normally lower throughput, higher latency and more fragile. This is really what I'm getting at when I say async is not faster.

You're right. "Arbitrary programs will run faster" is not the promise of Python async.

Python async does help a program work faster in the situation that phodge just described (waiting for web requests, or waiting for a slow hardware device), since the program can do other things while waiting for the locked IO (unlike a Python program that does not use async and could only proceed linearly through its instructions). That's the problem that Python asyncio purports to solve. It is still subject to the Global Interpreter Lock, meaning it's still bound to one thread. (Python's multiprocessing library is needed to overcome the GIL and break out a program into multiple threads, at the cost that cross-thread communication now becomes expensive).

brightball|5 years ago

This is often a point of confusion for people when looking at Erlang, Elixir or Go code. Concurrency beyond leveraging available CPU's doesn't really add any advantage.

On the web when the bulk of your application code time is waiting on APIs, database queries, external caches or disk I/O it creates a dramatic increase in the capacity of your server if you can do it with minimal RAM overhead.

It's one of the big reasons I've always wanted to see Techempower create a test version that continues to increase concurrency beyond 512 (as high as maybe 10k). I think it would be interesting.

arghwhat|5 years ago

>... is not necessarily a performance improvement over a sync version that just starts more workers and has the OS kernel scheduler organise things.

This is very true, especially when actual work is involved.

Remember, the kernel uses the exact same mechanism to have a process wait on a synchronous read/write, as it does for a processes issuing epoll_wait. Furthermore, isolating tasks into their own processes (or, sigh, threads), allows the kernel scheduler to make much better decisions, such as scheduling fairness and QoS to keep the system responsive under load surges.

Now, async might be more efficient if you serve extreme numbers of concurrent requests from a single thread if your request processing is so simple that the scheduling cost becomes a significant portion of the processing time.

... but if your request processing happens in Python, that's not the case. Your own scheduler implementation (your event loop) will likely also end up eating some resources (remember, you're not bypassing anything, just duplicating functionality), and is very unlikely to be as smart or as fair as that of the kernel. It's probably also entirely unable to do parallel processing.

And this is all before we get into the details of how you easily end up fighting against the scheduler...

wongarsu|5 years ago

Async IO was in large part a response to "how can my webserver handle xx thousand connections per second" (or in the case of Erlang "how do you handle millions of phone calls at once"). Starting 15 threads to do IO works great, but once you wait for hundreds of things at once the overhead from context switching becomes a problem, and at some point the OS scheduler itself becomes a problem

yamrzou|5 years ago

> is not necessarily a performance improvement over a sync version that just starts more workers and has the OS kernel scheduler organise things

Co-routines are not necessarily faster than threads, but they yield to a performance improvement if one has to spin thousands of them : they have less creation overhead and consume less RAM.

_pmf_|5 years ago

> I think it is surprising to a lot of people who do take it as read that async will be faster.

Literally the first thing any concurrency course starts with in the very first lesson is that scheduling and context overhead are not negligible. Is it so hard to expect our professionals to know basic principles of what they are dealing with?

dspillett|5 years ago

> think it is surprising to a lot of people who do take it as read that async will be faster.

This is because when they are first shown it, the examples are faster, effectively at least, because the get given jobs done in less wallclock time due to reduced blocking.

They learn that but often don't get told (or work out themselves) that in many cases the difference is so small as to be unmeasurable or in other circumstances the can be negative effects (overheads others have already mentioned in the framework, more things waiting on RAM with a part processed working day which could lead to thrashing in a low memory situation, greater concurrent load on other services such as a database and the IO system it depends upon, etc).

As a slightly of-the-topic-of-async example, back when multi-core processing was first becoming cheap enough that it was not just affordable at give but the default option, I had great trouble trying to explain to a colleague why two IO intensive database processes he was running were so much slower than when I'd shown him the same process (I'd run them sequentially). He was absolutely fixated on the idea that his four cores should make concurrency the faster option, I couldn't get through that in this case the flapping heads on the drives of the time were the bottleneck and the CPU would be practically idle no matter how many cores it had while the bottleneck was elsewhere.

Some people learn the simple message (async can handle some loads much more efficiently) as an absolute (async is more efficient) and don't consider at all that the situation may be far more nuanced.

nurettin|5 years ago

> An async implementation that does multiple ("embarrassingly parallel") tasks in the same process

You mean concurrent tasks in the same process?

lucideer|5 years ago

> I don't think that people who think async is faster have unreasonable expectations

I do.

And I don't think I'm alone nor being unreasonable.

kerkeslager|5 years ago

> The point of coroutines isn't to make your code execute faster, it's to prevent your process sitting idle while it waits for I/O.

This is a quintessential example of not seeing the forest for the trees.

The point of coroutines is absolutely to make my code execute faster. If a completely I/O-bound application sits idle while it waits for I/O, I don't care and I should not care because there's no business value in using those wasted cycles. The only case where coroutines are relevant is when the application isn't completely I/O bound; the only case where coroutines are relevant is when they make your code execute faster.

It's been well-known for a long time that the majority of processes in (for example) a webserver, are I/O bound, but there are enough exceptions to that rule that we need a solution to situations where the process is bound by something else, i.e. CPU. The classic solution to this problem is to send off CPU-bound processes to a worker over a message queue, but that involves significant overhead. So if we assume that there's no downside to making everything asynchronous, then it makes sense to do that--it's not faster for the I/O bound cases, but it's not slower either, and in the minority-but-not-rare CPU-bound case, it gets us a big performance boost.

What this test is doing is challenging the assumption that there's no downside to making everything asynchronous.

In context, I tend to agree with the conclusion that there are downsides. However, those downsides certainly don't apply to every project, and when they do, there may be a way around them. The only lesson we can draw from this is that gaining benefit from coroutines isn't guaranteed or trivial, but there is much more compelling evidence for that out there.

michaelcampbell|5 years ago

> The point of coroutines is absolutely to make my code execute faster.

I think rather the point is to make your APPLICATION either finish in less time, or to not take MORE time when given more load.

The code runs as fast as it runs, coroutines notwithstanding.

BiteCode_dev|5 years ago

This is not what this article is about.

The surprising conclusion of the article is that on a realistic scenario, the async web frameworks will ouput less requests/sec than the sync ones.

I'm very familiar with Python concurrency paradigms, and I wasn't expecting that at all.

Add to that zzzeek's article (the guy wrote SQLA...) stating async is also slower for db access, this makes async less and less appealing, given the additional complexity it adds.

Now appart from doing a crawler, or needing to support websockets, I find hard to justify asyncio. In fact, with David Beasley hinting that you probably can get away with spawning a 1000 threads, it raises more doubts.

The whole point of async was that, at least when dealing with a lot of concurrent I/O, it would be a win compared to threads+multiprocessing. If just by cranking the number of sync workers you get better results for less complexity, this is bad.

DougBTX|5 years ago

As far as I can tell, the main cost of threads is 2-4MB of memory usage for stack space, so async allows saving memory by allowing one thread to process more than one task. A big deal if you have a server with 1GB of memory and want to handle 100,000 simultaneous connections, like Erlang was designed for. But if the server has enough memory for as many threads that are needed to cover the number of simultaneous tasks, is there still a benefit?

BiteCode_dev|5 years ago

Anwsering my own comment cause I can't edit it anymore, but this article has started a heated debate on tweeter.

The author of "black" suggested that the cause of the slow down may be that asyncio actually starved postgres for resources:

https://twitter.com/llanga/status/1271719783080366086

zzzeek|5 years ago

> When you're dealing with external REST APIs that take multiple seconds to respond, then the async version is substantially "faster" because your process can get some other useful work done while it's waiting. Obviously the async framework introduces some overhead, but that bit of overhead is probably a lot less than the 3 billion cpu cycles you'll waste waiting 1000ms for an external service.

but threads get you the same thing with much less overhead. this is what benchmarks like this one and my own continue to confirm.

People often are afraid of threads in Python because "the GIL!" But the GIL does not block on IO. I think programmers reflexively reaching for Tornado or whatever don't really understand the details of how this all works.

danbruc|5 years ago

but threads get you the same thing with much less overhead.

That is not true, at least not in general, the whole point of using continuations for async I/O is to avoid the overhead of using threads, the scheduler overhead, the cost of saving and restoring the processor state when switching tasks, the per thread stack space, and so on.

aviba|5 years ago

GIL might not block on I/O but the implementation that uses PyObject does need the GIL no?

mavdi|5 years ago

I get enraged when articles like this get upvotes. The evidence given doesn't at all negate the reasoning behind using async, which as you said, is about not having to be blocked by IO, not freaking throughput test for an unrealistic scenario. Just goes to show the complete lack of understanding of the topic. I wouldn't dare write something up if I didn't 100% grasp it, but the bar is way lower for some others it seems.

didibus|5 years ago

I don't know the async Python specifics, but from what I understand, you don't necessarily need async to handle large number of IO requests, you can simply use non-blocking IO and check back on it synchronously either in some loop or at specific places in your program.

The use of async either as callbacks, or user threads, or coroutines, is a convenience layer for structuring your code. As I understand, that layer does add some overhead, because it captures an environment, and has to later restore it.

hinkley|5 years ago

I'm starting to wonder what the origin story is for titles like this. Have CS programs dropped the ball? Did the author snooze through these fundamentals? Or are they a reaction to coworkers who have demonstrated such an educational gap?

Async and parallel always use more CPU cycles than sequential. There is no question. He real questions are: do you have cycles to burn, will doing so brings the wall clock time down, and is it worth the complexity of doing so?

Izkata|5 years ago

I think it's because "async" has been overloaded. The post isn't about what I thought it would be upon seeing the title.

I was thinking this would be about using multiprocessing to fire off two or more background tasks, then handle the results together once they all completed. If the background tasks had a large enough duration, then yeah, doing them in parallel would overcome the overhead of creating the processes and the overall time would be reduced (it would be "faster"). I thought this post would be a "measure everything!" one, after they realized for their workload they didn't overcome that overhead and async wasn't faster.

Upon what the post was about, my response was more like "...duh".

danbruc|5 years ago

Obviously the async framework introduces some overhead, but that bit of overhead is probably a lot less than the 3 billion cpu cycles you'll waste waiting 1000ms for an external service.

Waiting for I/O does usually not waste any CPU cycles, the thread is not spinning in a loop waiting for a response, the operating system will just not schedule the thread until the I/O request completed.

toxik|5 years ago

Sigh. Async is somewhat orthogonal to parallel.

You are making dinner. You start to boil water for the potatoes. While that happens, you prepare the beef. Async.

You and your girlfriend are making dinner. You do the potatoes, she does the beef. Parallel.

You can perhaps see how you could have asynchronous and parallel execution at the same time.

In the context of a Web server, a request is handled by a single Python process (so don’t give me that “OS scheduler can do other things”). Async matters here because your request turnover can be higher, even if the requests/sec remains the same.

In the cooking example, each request gets a single cook. If that cook is able to do things asynchronously, he will finish a single meal faster.

If it were only parallel, you could have more cooks - because they would be less demanding - but they would each be slower.

maxmalysh|5 years ago

https://en.wikipedia.org/wiki/C10k_problem

rumanator|5 years ago

> How is this result surprising? The point of coroutines isn't to make your code execute faster, it's to prevent your process sitting idle while it waits for I/O.

It depends on what you mean by "faster". HTTP requests are IO bound, thus it is to be expected that the throughout of a IO bound service benefits from a technology that prevents your process from sitting idle while waiting for IO.

Thus it's surprising that Python's async code performs worse, not better, in both throughput and latency.

> When you're dealing with external REST APIs that take multiple seconds to respond, then the async version is substantially "faster"

The findings reported in the blog post you're commenting are the exact opposite of your claim: Python's async performs worse than it's sync counterpart.

ashtonkem|5 years ago

We need to stop saying “faster” with regards to async. The point of async was always either fitting more requests per compute resource, and/or making systems more latency consistent under load.

“Faster” is misleading because the speed improvements that you get with async is very dependent on load. At low levels there is going to typically be negligible or no speed gains, but at higher levels the benefit will be incredibly obvious.

The one caveat to this is cases where async allows you to run two requests in parallel, rather than sequentially. I would argue that this is less about async than it is about concurrency, and how async work can make some concurrent work loads more ergonomic to program.

zzzeek|5 years ago

you just contradicted yourself:

> “Faster” is misleading

and

> "At low levels there is going to typically be negligible or no speed gains, but at higher levels the benefit will be incredibly obvious."

there are no "speed" gains period. the same amount of work will be accomplished in the same amount of time with threads or async. async makes it more memory efficient to have a huge number of clients waiting concurrently for results on slow services, but all of those clients walking off with their data will not be reached "faster" than with threads.

the reason that asyncio advocates say that asyncio is "faster" is based on the notion that the OS thread scheduler is slow, and that async context switches are some combination of less frequent and more efficient such that async is faster. This may be the case for other languages but for Python's async implementations it is not the case, and benchmarks continue to show this.

vertex-four|5 years ago

The other thing about async is that, in some scenarios, it can make shared resource use clearer - i.e. in a program I've written, the design is such that one type on one thread (a producer) owns the data and passes it to consumers directly, rather than trying to deal with lock-free algorithms and mutexes for sharing the data and suchlike. A multi-threaded ring buffer is much less clearly correct than a single-threaded one.

delusional|5 years ago

> but that bit of overhead is probably a lot less than the 3 billion cpu cycles you'll waste waiting 1000ms for an external service.

You are not waiting for that 1000ms, and you haven't been for 35 years since the first os's starting feature preemptive multitasking.

When you wait on a socket, the OS will remove you from the CPU and place someone who is not waiting. When data is ready, you are placed back. You aren't wasting the CPU cycles waiting, only the ones the OS needs to save your state.

Actually standing there and waiting on the socket is not a thing people have done for a long time.

pdpi|5 years ago

> You are not waiting for that 1000ms, and you haven't been for 35 years since the first os's starting feature preemptive multitasking.

The point is that async IO allows your own process/thread to progress while waiting for IO. Preemptive multitasking just assigns the CPU to something else while waiting, which is good for the box as a whole, but not necessarily productive for that one process (unless it is multithreaded).

unknown|5 years ago

[deleted]

dilandau|5 years ago

>it's to prevent your process sitting idle while it waits for I/O.

...with the goal of making your application faster.

arghwhat|5 years ago

... no. With the goal of allowing concurrency without parallelism.

In doing that, you're removing natural parallelism, and end up competing with the kernel scheduler, both in performance and in scheduling decisions.

crimsonalucard1|5 years ago

I see this elitist attitude all over the internet. First it was people saying “Guys why are you over reacting to corona the flu is worse.”

Then it was people saying “Guys, stop buying surgical masks, The science says they don’t work it’s like putting a rag over your mouth.”

All of these so called expert know it alls were wrong and now we have another expert on asynchronous python telling us he knows better and he’s not surprised. No dude your just another guy on the internet pretending he’s a know it all.

If you are any good, you’ll realize that nodejs will beat the flask implementation any day of the week and the nodejs model is exactly identical to the python async model. Nodejs blew everything out of the water, and it showed that asynchronous single threaded code was better for exactly the test this benchmark is running.

It’s not obvious at all. Why is the node framework faster then python async? Why can’t python async beat python sync when node can do it easily? What is the specific flaw within python itself that is causing this? Don’t answer that question because you don’t actually know man. Just do what you always do and wait for a well intentioned humble person to run a benchmark then comment on it with your elitist know it all attitude claiming your not surprised.

Is there a word for these types of people? They are all over the internet. If we invent a label maybe they’ll start becoming self aware and start acting more down to earth.

ben509|5 years ago

> Nodejs blew everything out of the water

Node's JIT comes from a web browser's javascript implementation used by billions of people. It's also had async baked in from day one.

Python started single process, added threading, and then bolted async on top of that. And CPython is a pretty straight interpreter.

A comparison between Node and PyPy would be more informative, but PyPy has a far less mature JIT and still has to deal with Python's dynamism.

> If we invent a label maybe they’ll start becoming self aware and start acting more down to earth.

You can't lecture people into self-awareness, any more than experts can lecture everyone into wearing masks.

catalogia|5 years ago

You're making the classic mistake of assuming a common thread connects the people who've annoyed you in various unrelated contexts.

zaptheimpaler|5 years ago

I mean no one even mentioned node. Maybe it is faster idk. But we're talking about python?

(no title)

discuss