Async Python is not faster

[+] phodge|5 years ago|reply

How is this result surprising? The point of coroutines isn't to make your code execute faster, it's to prevent your process sitting idle while it waits for I/O.

When you're dealing with external REST APIs that take multiple seconds to respond, then the async version is substantially "faster" because your process can get some other useful work done while it's waiting. Obviously the async framework introduces some overhead, but that bit of overhead is probably a lot less than the 3 billion cpu cycles you'll waste waiting 1000ms for an external service.

[+] calpaterson|5 years ago|reply

I think it is surprising to a lot of people who do take it as read that async will be faster.

As I describe in the first line of my article I don't think that people who think async is faster have unreasonable expectations. It seems very intuitive to assume that greater concurrency would mean greater performance - at least one some measure.

> When you're dealing with external REST APIs that take multiple seconds to respond, then the async version is substantially "faster" because your process can get some other useful work done while it's waiting.

I'm afraid I also don't think you have this right conceptually. An async implementation that does multiple ("embarrassingly parallel") tasks in the same process - whether that is DB IO waiting or microservice IO waiting - is not necessarily a performance improvement over a sync version that just starts more workers and has the OS kernel scheduler organise things. In fact in practice an async version is normally lower throughput, higher latency and more fragile. This is really what I'm getting at when I say async is not faster.

Fundamentally, you do not waste "3 billion cpu cycles" waiting 1000ms for an external service. Making alternative use of the otherwise idle CPU is the purpose (and IMO the proper domain of) operating systems.

[+] kerkeslager|5 years ago|reply

> The point of coroutines isn't to make your code execute faster, it's to prevent your process sitting idle while it waits for I/O.

This is a quintessential example of not seeing the forest for the trees.

The point of coroutines is absolutely to make my code execute faster. If a completely I/O-bound application sits idle while it waits for I/O, I don't care and I should not care because there's no business value in using those wasted cycles. The only case where coroutines are relevant is when the application isn't completely I/O bound; the only case where coroutines are relevant is when they make your code execute faster.

It's been well-known for a long time that the majority of processes in (for example) a webserver, are I/O bound, but there are enough exceptions to that rule that we need a solution to situations where the process is bound by something else, i.e. CPU. The classic solution to this problem is to send off CPU-bound processes to a worker over a message queue, but that involves significant overhead. So if we assume that there's no downside to making everything asynchronous, then it makes sense to do that--it's not faster for the I/O bound cases, but it's not slower either, and in the minority-but-not-rare CPU-bound case, it gets us a big performance boost.

What this test is doing is challenging the assumption that there's no downside to making everything asynchronous.

In context, I tend to agree with the conclusion that there are downsides. However, those downsides certainly don't apply to every project, and when they do, there may be a way around them. The only lesson we can draw from this is that gaining benefit from coroutines isn't guaranteed or trivial, but there is much more compelling evidence for that out there.

[+] BiteCode_dev|5 years ago|reply

This is not what this article is about.

The surprising conclusion of the article is that on a realistic scenario, the async web frameworks will ouput less requests/sec than the sync ones.

I'm very familiar with Python concurrency paradigms, and I wasn't expecting that at all.

Add to that zzzeek's article (the guy wrote SQLA...) stating async is also slower for db access, this makes async less and less appealing, given the additional complexity it adds.

Now appart from doing a crawler, or needing to support websockets, I find hard to justify asyncio. In fact, with David Beasley hinting that you probably can get away with spawning a 1000 threads, it raises more doubts.

The whole point of async was that, at least when dealing with a lot of concurrent I/O, it would be a win compared to threads+multiprocessing. If just by cranking the number of sync workers you get better results for less complexity, this is bad.

[+] zzzeek|5 years ago|reply

> When you're dealing with external REST APIs that take multiple seconds to respond, then the async version is substantially "faster" because your process can get some other useful work done while it's waiting. Obviously the async framework introduces some overhead, but that bit of overhead is probably a lot less than the 3 billion cpu cycles you'll waste waiting 1000ms for an external service.

but threads get you the same thing with much less overhead. this is what benchmarks like this one and my own continue to confirm.

People often are afraid of threads in Python because "the GIL!" But the GIL does not block on IO. I think programmers reflexively reaching for Tornado or whatever don't really understand the details of how this all works.

[+] mavdi|5 years ago|reply

I get enraged when articles like this get upvotes. The evidence given doesn't at all negate the reasoning behind using async, which as you said, is about not having to be blocked by IO, not freaking throughput test for an unrealistic scenario. Just goes to show the complete lack of understanding of the topic. I wouldn't dare write something up if I didn't 100% grasp it, but the bar is way lower for some others it seems.

[+] hinkley|5 years ago|reply

I'm starting to wonder what the origin story is for titles like this. Have CS programs dropped the ball? Did the author snooze through these fundamentals? Or are they a reaction to coworkers who have demonstrated such an educational gap?

Async and parallel always use more CPU cycles than sequential. There is no question. He real questions are: do you have cycles to burn, will doing so brings the wall clock time down, and is it worth the complexity of doing so?

[+] danbruc|5 years ago|reply

Obviously the async framework introduces some overhead, but that bit of overhead is probably a lot less than the 3 billion cpu cycles you'll waste waiting 1000ms for an external service.

Waiting for I/O does usually not waste any CPU cycles, the thread is not spinning in a loop waiting for a response, the operating system will just not schedule the thread until the I/O request completed.

[+] rumanator|5 years ago|reply

> How is this result surprising? The point of coroutines isn't to make your code execute faster, it's to prevent your process sitting idle while it waits for I/O.

It depends on what you mean by "faster". HTTP requests are IO bound, thus it is to be expected that the throughout of a IO bound service benefits from a technology that prevents your process from sitting idle while waiting for IO.

Thus it's surprising that Python's async code performs worse, not better, in both throughput and latency.

> When you're dealing with external REST APIs that take multiple seconds to respond, then the async version is substantially "faster"

The findings reported in the blog post you're commenting are the exact opposite of your claim: Python's async performs worse than it's sync counterpart.

[+] ashtonkem|5 years ago|reply

We need to stop saying “faster” with regards to async. The point of async was always either fitting more requests per compute resource, and/or making systems more latency consistent under load.

“Faster” is misleading because the speed improvements that you get with async is very dependent on load. At low levels there is going to typically be negligible or no speed gains, but at higher levels the benefit will be incredibly obvious.

The one caveat to this is cases where async allows you to run two requests in parallel, rather than sequentially. I would argue that this is less about async than it is about concurrency, and how async work can make some concurrent work loads more ergonomic to program.

[+] delusional|5 years ago|reply

> but that bit of overhead is probably a lot less than the 3 billion cpu cycles you'll waste waiting 1000ms for an external service.

You are not waiting for that 1000ms, and you haven't been for 35 years since the first os's starting feature preemptive multitasking.

When you wait on a socket, the OS will remove you from the CPU and place someone who is not waiting. When data is ready, you are placed back. You aren't wasting the CPU cycles waiting, only the ones the OS needs to save your state.

Actually standing there and waiting on the socket is not a thing people have done for a long time.

[+] orf|5 years ago|reply

His async code creates a pool with only 10 max connections[1] (the default). Whereas his sync pool[2], with a flask app that has 16 workers, has significantly more database connections.

I expect upping this number would have a positive effect on asyncio numbers because the only thing[3] this[4] is[5] measuring[6] is how many database connections you have, and is about as far from a realistic workload as you can get.

Change your app to make 3 parallel requests to httpbin, collect the responses and insert them into the database. That's an actually realistic asyncio workload rather than a single DB query on a very contested pool. I'd be very interested to see how sync frameworks fare with that.

1. https://github.com/calpaterson/python-web-perf/blob/master/a...

2. https://github.com/calpaterson/python-web-perf/blob/master/s...

3. https://github.com/calpaterson/python-web-perf/blob/master/a...

4. https://github.com/calpaterson/python-web-perf/blob/master/a...

5. https://github.com/calpaterson/python-web-perf/blob/master/a...

6. https://github.com/calpaterson/python-web-perf/blob/master/a...

[+] calpaterson|5 years ago|reply

Hi - as mentioned in the article all connections went through pgbouncer (limited to 20) and I was careful to ensure that all configurations saturated the CPU so I'm pretty confident they were not waiting on connections to open. Opening a connection from pgbouncer over a unix socket is very fast indeed - my guess is perhaps a couple of orders of magnitude faster than without it. 20 connections divided by 4 CPUs is a lot, and pretty much all CPU time was still spent in Python.

Sidenote here: one thing I found but didn't mention (the reason I put in the pooling, both in Python and pgbouncer) is that otherwise, under load, the async implementions would flood postgres with open connections and everything would just break down.

I think making a database query and responding with JSON is a very realistic workload. I've coded that up many times. Changing it to make requests to other things (mimicking a microservice architecture) is also interesting and if you did that I'd be interested to read your write up.

[+] bildung|5 years ago|reply

> His async code creates a pool with only 10 max connections[1] (the default). Whereas his sync pool[2], with a flask app that has 16 workers, has significantly more database connections.

And the reasoning is explained in the article:

"The rule I used for deciding on what the optimal number of worker processes was is simple: for each framework I started at a single worker and increased the worker count successively until performance got worse."

[+] anentropic|5 years ago|reply

> Change your app to make 3 parallel requests to httpbin, collect the responses and insert them into the database. That's an actually realistic asyncio workload

I don't see how that is a more "realistic" asyncio workload.

It might be a workload that async is better suited for, but the point of the article is to compare async web frameworks, which will often be used just to fetch and return some data from the db.

If you had an endpoint which needed to fetch 3 items from httpbin and insert them in the db it may make sense to use asyncio tools for that, even within the context of a web app running under a sync framework+server like Falcon+Gunicorn.

In my experience Python web apps (Django!) often spend surprisingly little time waiting on the db to return results, and relatively a large amount of time cpu-bound instantiating ORM model instances from the db data, then transforming those instances back into primitive types that can be serialized to JSON in an HTTP response. In that context I am not surprised if sync server with more processes is performing better. In this test it's not even that bad... the 'ORM' seems to be returning just a tuple which is transformed to a dict and then serialized.

[+] the_mitsuhiko|5 years ago|reply

He only has 4 CPUs. I doubt rising the worker count is going to help the async situation. From my experience it’s really hard to make async outperform sync when databases are involved because the async layer adds so much overhead. Only when you are completely io bound with lots of connections does async outperform sync in python.

[+] kissgyorgy|5 years ago|reply

Before you criticize the article, you should read it. He wrote a whole section about the specific worker numbers and why and how he choose them.

[+] nik_s|5 years ago|reply

On top of that, the author uses aiopg rather than asyncpg[1] for the async database operations, even though asyncpg is (allegedly) a whole lot faster.

1. https://github.com/MagicStack/asyncpg

[+] jordic|5 years ago|reply

Absolutely agree, sum to this the quality of the driver aiopg vs asyncpg...

[+] zzzeek|5 years ago|reply

I am SUPER happy someone else is finally looking at this. It is long past time that the reflexive use of asycnio or systems like gevent/eventlet for no other reason than "hand-wavy SPEED" come to an end. That web applications that literally serve just one user at at time are built in Tornado for "speed". (my example for this is the otherwise excellent SnakeViz: https://jiffyclub.github.io/snakeviz/ which IMO should have just used wsgiref).

As the blog post apparently cites as well (woo!), I've written about the myth of "async == speed" some years ago here and my conclusions were identical.

https://techspot.zzzeek.org/2015/02/15/asynchronous-python-a...

[+] ahupp|5 years ago|reply

This is true as far as it goes, but is not testing the (very common) areas where async shines.

Imagine you're loading a profile page on some social networking site. You fetch the user's basic info, and then the information for N photos, and then from each photo the top 2 comments, and for each comment the profile pic of the commentor. You can't just fetch all this in one shot because there's data dependencies. So you start fetching with blocking IO, but that makes your wait time for this request proportional to the number of fetches, which might be large.

So instead, you ideally want your wait to be proportional to the depth of your dependency tree. But composing all these fetches that way is hard without the right abstraction. You can cobble it together with callbacks but it gets hairy fast.

So (outside of extreme scenarios) it's not really about whether async is abstractly faster than sync. It's about how real developers would solve the same problem with/without async.

(Source: I worked on product infrastructure in this area for many years at FB)

[+] reggieband|5 years ago|reply

I felt baffled by this thread until I read this response. async/await for me has always been about managing this kind of dependency nightmare. I guess if all you have to do is spawn 100 jobs that run individually and report back to some kind of task manager then the performance gains of threads probably beats async/coroutine based approaches on a pure speed benchmark. But when I have significant chains of dependent work then the very idea of using bare threads and callbacks to manage that is annoying.

At least in Typescript nowadays, the ability to just mark a function `async` and throw an `await` in front of its invocation drastically lowers the barrier to moving something from blocking to non-blocking. In the same cases if I had to recommend the same change with thread pools and callbacks (and the manual book-keeping around all that) most developers just wouldn't bother.

[+] alexhutcheson|5 years ago|reply

A lot of the debate and discussion here seems to come from the fact that the example program demonstrates concurrency across requests (each concurrent request is being handled by a different worker), but no concurrency within each request: The code to serve each request is essentially one straight line of execution, which pauses while it waits for a DB query to return.

A more interesting example would be a request that requires multiple blocking operations (database queries, syscalls, etc.). You could do something like:

    # Non-concurrent approach
    def handle_request(request):
      a = get_row_1()
      b = get_row_2()
      c = get_row_3()
      return render_json(a, b, c)
   

    # asyncio approach
    async def handle_request(request):
      a, b, c = await asyncio.gather(
        get_row_1(),
        get_row_2(),
        get_row_3())
      return render_json(a, b, c)

    # Naive threading approach
    def handle_request(request):
       a_q = queue.SimpleQueue()
       t1 = threading.Thread(target=get_row_1(a_q))
       t1.start()
       b_q = queue.SimpleQueue()
       t2 = threading.Thread(target=get_row_2(b_q))
       t2.start()
       c_q = queue.SimpleQueue()
       t3 = threading.Thread(target=get_row_3(c_q))
       t3.start()

       t1.join()
       t2.join()
       t3.join()

       return render_json(a_q.get(), b_q.get(), c_q.get())


    # concurrent.futures with a ThreadPoolExecutor 
    def handle_request(request, thread_pool):
      a = thread_pool.submit(get_row_1())
      b = thread_pool.submit(get_row_2())
      c = thread_pool.submit(get_row_3())
      return render_json(a.result(), b.result(), c.result())

These examples demonstrate what people find appealing about asyncio, and would also tell you more about how choice of concurrency strategy affects response time for each request.

[+] knite|5 years ago|reply

This a great point, surprised you received no follow-up comments!

[+] berbc|5 years ago|reply

Is speed really a good reason for using async? If I remember correctly, asynchronous I/O was introduced to deal with many concurrent clients.

Therefore, I would have liked to see how much memory all those workers use, and how many concurrent connections they can handle.

[+] jillesvangurp|5 years ago|reply

I think speed is the wrong word here. A better word is throughput.

The underlying issue with python is that it does not support threading well (due to the global interpreter lock) and mostly handles concurrency by forking processes instead. The traditional way of improving throughput is having more processes, which is expensive (e.g. you need more memory). This is a common pattern with other languages like ruby, php, etc.

Other languages use green threads / co-routines to implement async behavior and enable a single thread to handle multiple connections. On paper this should work in python as well except it has a few bottlenecks that the article outlines that result in throughput being somewhat worse than multi process & synchronous versions.

[+] jordic|5 years ago|reply

In our use case switching to asyncio it's like moving from 12 cores to 3... (And I'm pretty sure we are handling more concurrency... from 24-30 req/s to 150req/s But our workload is mostly network related (db, external services...)

[+] blondin|5 years ago|reply

same.

maybe author is concerned that many people are jumping the gun on async-await before we all fully understand why we need it at all. and that's true. but that paradigm was introduced (borrowed) to solve a completely different issue.

i would love to see how many concurrent connections those sync processes handle.

[+] rlpb|5 years ago|reply

I find it interesting that all the talk here is about performance, and nobody has mentioned any benefits of Async Python when performance isn't an issue.

I use trio/asyncio to more easily write correct complex concurrent code when performance doesn't matter. See "The Problem with Threads"[1].

For this use case, Async Python probably still isn't faster, but that doesn't matter. Let's not throw out the baby with the bathwater :)

[1] https://www2.eecs.berkeley.edu/Pubs/TechRpts/2006/EECS-2006-...

[+] PaulHoule|5 years ago|reply

I love asyncio for writing mixed initiative "servers". For instance, I have an asyncio "server" that accepts websocket connections on one side, waits on an AQMP queue, proxies requests and mediates for the HEOS smart speaker API, Phillips Hue, U.S. Weather Service, etc.

This is great for react or vue front end applications which get their state updated when things happen in the outside world (e.g. somebody else starts the music player, that gets related)

When CPU performance is an issue (say generate a weather video from frames) you want to offload that into another process or thread, but it is an easy programming style if correctness matters.

[+] rukittenme|5 years ago|reply

Whats the point of writing concurrent code if its not faster?

[+] ChrisMarshallNY|5 years ago|reply

I use async for UI work, but don't have much of an opinion for servers.

I suspect that the best async is that supported by the server OS, and the more efficiently a language/compiler/linker integrates with that, the better. JIT/interpreted languages introduce new dimensions that I have not experienced.

I do have some prior art in optimizing libraries, though. In particular, image processing libraries in C++. My opinion is that optimization is sort of a "black art," and async is anything but a "silver bullet." In my experience, "common sense" is often trumped by facts on the ground, and profilers are more important than careful design.

I have found that it's actually possible to have worse performance with threads, if you write in a blocking fashion, as you have the same timeline as sync, but with thread management overhead.

There are also hardware issues that come into play, like L1/2/3 caches, resource contention, look-ahead/execution pipelines and VM paging. These can have massive impact on performance, and are often only exposed by running the app in-context with a profiler. Sometimes, threading can exacerbate these issues, and wipe out any efficiency gains.

In my experience, well-behaved threaded software needs to be written, profiled and tuned, in that order. An experienced engineer can usually take care of the "low-hanging fruit," in design, but I have found that profiling tends to consistently yield surprises.

T.A.N.S.T.A.A.F.L.

[+] woofie11|5 years ago|reply

Cooperative multitasking came out slower than preemptive in the nineties, so this is unsurprising in the generic case.

I think my question is whether async Python is slower in the case it was designed for -- many, long-running open sockets.

Async was traditionally used server-side for things like chat servers, where I might have millions of sockets simultaneously open.

[+] ris|5 years ago|reply

> Cooperative multitasking came out slower than preemptive in the nineties

This wasn't really the reason for the shift away from cooperative multitasking, it was really because cooperative multitasking isn't as robust or well behaved unless you have a lot of control over what tasks you have trying to run together.

In theory cooperative multitasking should have better throughput (latency is another story) because each task can yield at a point where its state is much simpler to snapshot rather than having to do things like record exact register values and handle various situations.

[+] mpweiher|5 years ago|reply

Yes, the whole hoopla about async and particularly async/await has been a bit puzzling, to say the least.

Except for a few very special cases, it is perfectly fine to block on I/O. Operating systems have been heavily optimized to make synchronous I/O fast, and can also spare the threads to do this.

Certainly in client applications, where the amount of separate I/O that can be usefully accomplished is limited, far below any limits imposed by kernel threads.

Where it might make sense is servers with an insane number of connections, each with fairly low load, i.e. mostly idle, and even in server tasks quality of implementation appears to far outweigh whether the server is synchronous or asynchronous (see attempts to build web servers with Apple's GCD).

For lots of connections actually under load, you are going to run out of actual CPU and I/O capacity to serve those threads long before you run out of threads.

Which leaves the case of JavaScript being single threaded, which admittedly is a large special case, but no reason for other systems that are not so constrained to follow suit.

[+] compressedgas|5 years ago|reply

> Function colouring is a big problem in Python

Not when you know how to call sync functions from async functions and vice versa.

An sync function can call an async function via:

  loop = asyncio.new_event_loop()
  result = loop.run_until_complete(asyncio.ensure_future(red(x)))

A async function can call a sync function via:

  loop = asyncio.get_event_loop()
  result = await loop.run_in_executor(None, blue, x)

Where red and blue are defined as:

  async def red(x):
        pass

  def blue(x):
      pass

Note that the documentation is wrong about recommending create_task over ensure_future. That recommendation results in more restrictive code as create_task only accepts a coroutine and not a task.

This works for regular functions I don't know how it works for generators.

[+] lovasoa|5 years ago|reply

Async python is faster when you use it for running parallel tasks. In this benchmark, you are running a single database request per query, so there is no advantage to being asynchronous: a pool of processes will scale just as well (but it will use more memory). The point of async is that it lets you easily make a Postgres query, AND an HTTP query, AND a redis query in parallel.

[+] emptysea|5 years ago|reply

Couldn’t threads handle that use case?

[+] hombre_fatal|5 years ago|reply

One big difference between one thread per request vs single-threaded async code is that synchronization and accessing shared resources is trivial when all of your code is running on a single thread.

An entire category of data races like `x += 1` become impossible without you even thinking about it. And that's often worth it for something like a game server where everything is beating on the same data structures.

I don't use Python, so I guess it's less of an issue in Python since you're spawning multiple processes rather than multiple threads so you're already having to share data via something out of process like Redis and using its own synchronization guarantees.

But for example the naive Go code I tend to read in the wild always has data races here and there since people tend to never go 100% into a channel / mutex abstraction (and mutexes are hard). And that's not a snipe at Go but just a reminder of how easy it is to take things for granted when you've been writing single-threaded async code for a while.

[+] wwright|5 years ago|reply

FWIW, Rust gives you the same simplicity (no data races at runtime) with threads as well.

(Not necessarily on topic, but if you’re really excited about dodging data races, I figured it would give you something fun to look at!)

[+] kkirsche|5 years ago|reply

This reminds me of Rob Pike’s talk from Golang about how concurrency is not parallelism. I think the python community may be hitting this issue where async is meant to model concurrent behavior not always or necessarily facilitate parallel activity

[+] chooseaname|5 years ago|reply

I think a good chunk of Python developers expected (expect?) async to be a "get out of GIL free card". It's not.

[+] mikkelam|5 years ago|reply

Techempower [1] has a really great collection of benchmarks using highly controlled test setups that I like to look at to compare web frameworks. Not affiliated with them, but it's relevant to the post.

[1] https://www.techempower.com/benchmarks/#section=data-r19&hw=...

[+] Grimm1|5 years ago|reply

Async is useful for high IO where you may have a lot of down time between the requests. Are you pulling many requests from different servers with different response times, communicating with a db or pulling out large response bodies. Async is probably going to do better since each one of those synchronously represents potentially large idling periods where other requests could have gotten work done.

As to the article the comparisons are good but fails to mention resource constraints, like Gunicorn, forking 16 instances is going to be a lot heavier on memory so for a little more RPS you're probably spending a decent chunk of change more to run your work and I don't think that's worth it considering the Async model in python is pretty easy to grok these days and under this benchmark share a similar performance profile.

Now that said If I had to guess these numbers are fine for the average API but if you're doing something like high throughput web crawling or need to serve something on the order of 10's of thousands to hundred thousands RPS async will win out on speed and resource use and ultimately cost.

Plus at one point they were like "we could only get an 18% speed up with Vibora" haven't used them my self. But 18% performance increase at really any level of load is fantastic. Hand waving that off tells me the work loads for what is "realistic" don't take in to account real high RPS workloads like you might see at major tech companies.

[+] ohyes|5 years ago|reply

This is just a fundamental misunderstanding of what concurrency is. I do not see why it requires a benchmark.

Concurrency is many things at once. That's it.

Async frameworks end up with better concurrency properties because you're not paying the memory and context switching overhead of an entire 'thread' for each thing that you are trying to do at the same time. Instead you are paying the (normally cheaper) overhead of what is essentially a co-routine call.

The disadvantage being that you have to manage these context switches yourself, and that they tend to happen more frequently (to maintain the illusion that we are doing many things all at the same time on a single cpu). There is no way that an async framework would ever have better straight line performance than a synchronous one, simply because of all of these extra context switches, and that's fine because that's not what it is for.

Imagine I want to have 10,000 requests held open at the same time. Your flask server with 16 workers is going to have a tough time as you don't have enough workers to service that many threads, requests won't get serviced and things will start to time out. Because an async framework multiplexes those workers so that can each individually handle multiple requests at once. Multiplexing in this way costs you something performance wise.

If you were to crank up the concurrency beyond 100 at once (the default in the posted scripts), you would start getting different results.

[+] rajandatta|5 years ago|reply

Excellent article. Well done. Great to see that you examined throughput, latency and other measures. It may not answer all questions that arise in real-life situations and work loads but we need more numerical experiments to really understand how this works.

[+] birdyrooster|5 years ago|reply

No one said it was faster. We said it scaled better. That’s because blocking all execution on IO is bad for time sensitive tasks like web requests.

If you want to actually go faster the asyncio interfaces used by aiomultiprocess module get you there by maintaining the event loop across multiple processes. You can save time and memory by sharding your data set and aggregating the return data.

[+] brodouevencode|5 years ago|reply

Great article, but don't just abandon async entirely. There are still use cases. For me: I use it to pull data from several external APIs all at once. That data is then married up to produce another data object with some special sauce computation. All of these network calls run in parallel, therefore the network overhead (DNS lookups, SSL handshakes, etc.) all operate at the same time instead of running one after the other if it were in synchronous mode. IIRC the benchmarks for this went from running at 3 minutes+ to just over 20 seconds.

So there's still utility, so YMMV.

[+] tannhaeuser|5 years ago|reply

It's about time someone put this into perspective with figures before more and more people rush to implement business apps in async style (= 80's cooperative multiprocessing). There are exceptions of course; for example Node.js was originally envisioned for eg. game servers where async's purported robustness in the presence of a massive number of open sockets supposedly helps. But I think for the vast majority of workloads going async has a terrible impact to your codebase (either with callback hell or by deprecating most of the host language's flow control primitives like try/catch in favour of hard-to-debug ad-hoc constructs such as Promises). Another price to pay is groking Node.js' streams (streams2/streams3) and domain APIs and unhelpful exception handling story with subtle changes even as late as in v13. As I hear, Python's async APIs aren't uncontroversial either.

Now the next thing I'd be interested to get debunked is multithreading vs multiple processes with shared memory (SysV shmem). I'm not very sure, but I'd not been surprised to hear that the predominance of multithreaded runtimes (JVM, most C++ appservers) is purely a cargo-cult effect. As far as I remember, threads were introduced for small and isolated problems in GUI programs, like code completion in IDEs; they were never intended for replacing O/S processes and their isolation guarantees.

[+] waheoo|5 years ago|reply

Kevlin henney has a lot to say about concurrent processing ithink it was one of thesr talks:

https://youtu.be/2yXtZ8x7TXw

https://youtu.be/ZsHMHukIlJY

Threading is faster, but really only if youre willing to give up your locks and design for it properly.

353 comments