top | item 33547323

Python Asyncio

223 points| simonpure | 3 years ago |superfastpython.com

181 comments

order
[+] quietbritishjim|3 years ago|reply
Not complete - doesn't include Task Groups [1]

In fairness they were only included in asyncio as of Python 3.11, which was released a couple of weeks ago.

These were an idea originally from Trio [2] where they're called "nurseries" instead of "task groups". My view is that you're better off using Trio, or at least anyio [3] which gives a Trio-like interface to asyncio. One particularly nice thing about Trio (and anyio) is that there's no way to spawn background tasks except to use task groups i.e. there's no analogue of asyncio's create_task() function. That is good because it guarantees that no task is ever left accidentally running in the background and no exception left silently uncaught.

[1] https://docs.python.org/3/library/asyncio-task.html#task-gro...

[2] https://github.com/python-trio/trio

[3] https://anyio.readthedocs.io/en/latest/

[+] throwaway81523|3 years ago|reply
I have such a feeling of tragedy about Python. I wish it had migrated to BEAM or implemented something similar, instead of growing all this async stuff. Whenever I see anything about Python asyncio, I'm reminded of gar1t's hilarious but NSFW rant about node.js, https://www.youtube.com/watch?v=bzkRVzciAZg . Content warning: lots of swearing, mostly near the end.
[+] Steltek|3 years ago|reply
Python, more than other languages, ends up being used in utility scripts on end user devices, where you'll find just about everything. Python's async seems to still have a large swing in available features. Unlike async methods in other languages, semi-serious dabbling in it just for fun is courting a lot of headaches later.
[+] dang|3 years ago|reply
Ok, we've taken completeness out of the title above.
[+] kissgyorgy|3 years ago|reply
After 2 years of using asyncio in production, I recommend to avoid it if you can. With async programming, you take the complexity of concurrent programming, which is way harder than you can imagine.

Also nobody mentions this for some reason, but asyncio doesn't make your programs faster, in fact it makes everything 100x SLOWER (we measured it multiple times, compared the same thing to the sync version), but makes your program concurrent (NOT parallel), where the tradeoff is you use more CPU, maybe less memory, and can saturate your bandwidth better. Never do anything CPU-bound in an async loop!

[+] number6|3 years ago|reply
Asyncio is not ment for CPU Bound Task but for IO Bound Tasks. You should have used multiprocessing. The problem you describe is exactly what asyncio is used for: saturate your bandwidth better.

maybe the right aproach would have been a Threadpool? Plus you don't have to refractor the task. Just let it run sync, but you can make it also async

[+] whakim|3 years ago|reply
This just sounds like you fundamentally misunderstood how asyncio works and applied it to a use-case which it wasn't suited to. Why would you use async if you're doing a bunch of CPU-bound work?
[+] acedTrex|3 years ago|reply
This is an insane take, if you are doing async you already HAVE a need for concurrent programming. Async just makes that simpler to read and write.
[+] qbasic_forever|3 years ago|reply
This isn't a problem specific to python though. You hint at it, async programming (single thread worker loop) is not the right way to deal with CPU-bound tasks. NodeJS or any other async language would have the same problems. You want multiprocessing or multithreading, where work is distributed to different CPU cores at the same time. Python gives you the ability to use any of those three paradigms. Choose wisely.
[+] jacob019|3 years ago|reply
After using gevent for about a decade I have started using asyncio for new projects, just because it's in the standard library and has the official async blessing of the Python gods. Indeed it is way harder. I'm always coming up against little gotchas that take time to debug and fix. Part of me enjoys the challenge of learning something new and solving little puzzles. It's getting easier, especially as I build up a collection of in-house async libraries for various things. As for performance, it's not too bad for mostly io bound tasks, which is why one uses async in the first place. Some tight loop benchmarks for message passing with other processes show it to be about half the speed of gevent in my case, which is fine. It's nice to be able to deploy async microservices without installing gevent, and there's a certain value to the discipline that it imposes. I like how I am able to bring non-async code into the async world using threading. I imagine the performance would improve quite a bit with pypy, perhaps exceeding that of gevent. Gevent makes it so damn easy, I've been spoiled. I was disappointed when asyncio came out, as I would have preferred the ecosystem moved in the gevent direction instead; but I'm coming around. It's super annoying how the python ecosystem has been bifurcated with asyncio. You really have to choose one way or another at the beginning of a project and stick with it.

And yeah, async programming (in Python) isn't really for CPU bound stuff. You might benefit from multiprocessing and just use asyncio to coordinate, which is what it excels at. PyPy can really help with CPU bound stuff too, if the code is mostly pure Python.

[+] wiredfool|3 years ago|reply
My opinion is that async/evented is a (useful) performance hack.

You wouldn't do it that way for any reason other than you can get better performance than way than by using threading or other pre-emptive multitasking.

It's not a better developer experience than writing straight forward code where threads are linear and interruptions happen transparently to the flow of the code. There are foot guns everywhere, with long running tasks that don't yield or places where there are hidden blocking actions.

It reminds me a bit of the old system 6 mac cooperative multitasking. It was fine, and significantly faster because your program would only yield when you let it do it, so critical sections coule be guaranteed to not context shift. However, you could bring the entire machine to a halt by holding down the mouse button, as eventually an event handler would get stuck waiting for mouse up.

Pre-emptive multitasking was a huge step forward -- it made things a bit slower on average, but the tail latency was greatly improved, because all the processes were guaranteed at least some slice of the machine.

[+] m3047|3 years ago|reply
I seem to use asyncio a lot, so maybe it's just good for internet plumbing. Things I've used it for:

* A Postfix TCP table.

* A milter.

* DNS request forwarding.

* Reading data from a Unix domain socket and firing off dynamic DNS updates.

* A DNS proxy for Redis.

* A netflow agent.

* A stream feeder for Redis.

https://github.com/search?q=user%3Am3047+asyncio&type=Reposi...

By the way you can't use it for disk I/O, but you can try to use it for e.g. STDOUT: https://github.com/m3047/shodohflo/blob/5a04f1df265d84e69f10...

  class UniversalWriter(object):
    """Plastering over the differences between file descriptors and network sockets."""
[+] samwillis|3 years ago|reply
There is very little in everyday Python usage that benefits from Asyncio. Two in webdev, are long running request (Websockets, SSE, long polling), and processing multiple backend IO processes in parallel. However the later is very rare, you may think you have multiple DB request that could use asyncio, but most of the time they are dependent on each other.

Almost all of the time a normal multithreaded Python server is perfectly sufficient and much easer to code.

My recommendation is only use it where it is really REALY required.

[+] jacob019|3 years ago|reply
I don't know what you use Python for every day. Sure for some utility scripts it doesn't matter. I use it to run my ecommerce business, and for a variety of other plumbing, mostly for passing messages around between users, APIs, databases, printers, etc. Async programming is a must for just about everything. I guess the alternative would be thread pools, or process pools, like back in the day; but that has a lot of downsides. It is slower, way more resource intensive, and state sharing/synchronization becomes a major issue, you can only handle thread number of tasks concurrently and you're going to use that memory all the time; not to mention all the code needs to be thread safe. Most of our systems use gevent, but we've starting using asyncio for new projects.
[+] jsmith45|3 years ago|reply
Sure. As a general rule the general style of coding using in async-await approaches is primarily about one of two things

The first purpose is allowing more throughput at the expense of per request latency (Typically each request will take longer than with equivalent sync code).

The main scenario where an async version could potentially complete sooner than a sync version is when the the code is able to start multiple async tasks and then await then as a group. For example if your task needs to make 10 http requests, and make those requests sequentially like one would in sync code, it will be slower. If one starts all ten calls and then awaits the results, then you might be able to a speedup on this overall request.

Other main purpose is when working with a UI framework where there is a main thread, and certain operations can only occur on the main thread. Use of async/await pattern helps avoid accidentally blocking the main thread, which can kill application responsiveness. This is why the pattern is used in javascript, and was one of the headline scenarios when C# first introduced this pattern. (The alternative being other methods of asynchrony which typically include use of callbacks, which can make the code harder to develop or understand).

But basically, unless you have UI blocking problems, or are concerned about the number of requests per second you can handle, async-await patterns may be better avoided. It being even more costly in python than it is in some other languages does not really help.

[+] nicolaslem|3 years ago|reply
Agreed, the SaaS Python application I maintain does SSE with threads. It is not the prettiest thing but it works and threads are cheaper than rewriting the whole thing to be async.
[+] usrbinbash|3 years ago|reply
Completely agree with this.

I have several python services running as glue between some of our components. They all use threading. I find it alot easier to reason about.

[+] leveraction|3 years ago|reply
I have used asyncio through aiohttp, and I have been pretty happy with it, but I also started with it from the beginning, so that probably made things a little easier.

My setup is a bunch of microservices that each run an aiohttp web server based api for calls from the browser where communications between services are done async using rabbitmq and a hand rolled pub/sub setup. Almost all calls are non-blocking, except for calls to Neo4j (sadly, they block, but Neo4j is fast, so its not really a problem.)

With an async api I like the fact that I can make very fast https replies to the browser while queing the resulting long running job and then responding back to the Vue based SPA client over a web socket connection. This gives the interface a really snappy feel.

But Complex? Oh yes.

But the upside is that it is also a very flexible architecture, and I like the code isolation that you get with microservices. Nevertheless, more than once I have thought about whether I would choose it all again knowing what I know now. Maybe a monolithic flask app would have been a lot easier if less sexy. But where's the fun in that?

[+] senko|3 years ago|reply
> With an async api I like the fact that I can make very fast https replies to the browser while queing the resulting long running job and then responding back to the Vue based SPA client over a web socket connection. This gives the interface a really snappy feel.

How does this compare to doing the same with eg. Django Channels (or other ASGI-aware frameworks)?

I have yet to find a use case compelling enough to dive into async in Python (doesn't help that I also work in JS and Go so I just turn to them for in cases where I could maybe use asyncio). This is not to say it's useless, just that I'm still searching for a problem this is the best solution for.

[+] jerrygenser|3 years ago|reply
> Maybe a monolithic flask app would have been a lot easier if less sexy. But where's the fun in that? Not to sound snarky, but the fun would be in being able to solve business problems without fighting against a complex system. Microservices can definitely make sense but if you need them and know you need them. If it's just for fun and it's not a hobby, and you're being paid to maintain someone else's system then it's definitely worth going with a simpler system.
[+] meitham|3 years ago|reply
I share your sentiment and have been using aiohttp for five years and pretty happy with it. My current project is a web service with blocking SQL server backend, so I tend to do loop.run_in_executor for every DB.execute statement. But now I’m considering just running a set of light aio streams sub processes with a simple bespoke protocol that takes a SQL statement and returns the result json encoded to move away from threads.
[+] greyman|3 years ago|reply
Maybe off-topic, but my advice would be: if you need this guide, consider to switch to another language, if that's possible. In our company we switched to Go, and all those asyncio problems were magically solved.
[+] pantsforbirds|3 years ago|reply
I've had absolutely no problems using async programming in python. It's extremely easy to setup a script in 100-200 lines of code to do something like:

pull the rows from postgres that match query <x>. Process the data and push each row as an event into rabbitmq. In <200 lines of code i was easily processing 25k/rows per second and it only took me a few minutes to figure out the script.

[+] austinpena|3 years ago|reply
Agreed. Moving my Python data analytics to a GRPC server that I call from a Go service has been _so much_ easier to manage and debug.
[+] akdor1154|3 years ago|reply
I love Python and fully agree with you. :(
[+] brrrrrm|3 years ago|reply
Important to note:

> They are suited to non-blocking I/O with subprocesses and sockets, however, blocking I/O and CPU-bound tasks can be used in a simulated non-blocking manner using threads and processes under the covers.

If you're using it for anything besides slow async I/O, you're going to have to do some heavy lifting.

I've also found the actual asyncio implementation in CPython to be slow. Measuring purely event-loop overhead (and doing little/nothing in the async spawned tasks), it's 120x slower than JavaScript on my machine. https://twitter.com/bwasti/status/1572339846122991617

[+] bjt2n3904|3 years ago|reply
Ok. This is what has been a huge hangup for me.

It really seems that if you're doing asyncio, you must do EVERYTHING async, it's like asyncio takes over (infects?) the entire program.

[+] andrewstuart|3 years ago|reply
I love Python async - it’s a complete game changer for certain types of applications.

I find Python async to be fun and exciting and interesting and powerful.

BUT it is a big power tool and there’s so much in it that it’s hard to work out how to drive it right.

I have pretty good experience with Python and javascript.

I prefer Python to javascript when writing async code.

Specific example I spent hours trying to drive some processes via stdin/stout/stderr with javascript and it kept failing for reasons I couldn’t determine.

Switched to Python async and it just worked.

The most frustrating thing about async Python is that it has been improving greatly. That means that it’s not obvious what “the right way” is, ie using the latest techniques. This is actually a really big problem for async Python. I’m fairly competent with it, but still have to spend ages working out if I’m doing it “the right way/the latest way”.

The Python project really owes it to its users to have a short cookbook that shows the easiest, most modern recommended way to do common tasks. Somehow this cookbook must give the reader instant 100% confidence that they are reading the very latest official recommendations and thinking on simple asyncio techniques.

Without such a “latest and greatest techniques of async Python cookbook” it’s too easy to get lost in years of refinement and improvement and lower and higher level techniques.

The Python project should address this, it’s a major ease of use problem.

Ironically, pythons years of async refinement mean there’s many many many ways to get the same things done, conflicting with pythons “one right way to do it” philosophy.

It can be solved with documentation that drives people to the simplest most modern approaches.

[+] arecurrence|3 years ago|reply
The way that this is formatted... I initially thought it was a Haiku :)
[+] ok_dad|3 years ago|reply
The only thing I ever use async in Python for is batch requests, like hundreds or thousands of little requests. I collect them up in a task list with context, run them with one of the “run all of these and return the results or errors” functions in the library, then process the results serially.

Anything I need to do that doesn’t use this simple IO pattern, like cpu bound workers, I prefer to use processes or multiple copies of the app synchronized with a task queue.

[+] xwowsersx|3 years ago|reply
I'm not in a much of a position to evaluate Python's asyncio since I really have not used it very much. However, over the last few days, I started to dig into it and tried to get some (what I think are) very basic examples working and really struggled. That alone is not fully dispositive because I've used many async implementations in various languages and each of them have a bit of a learning curve and their own wrinkles you have to ramp up on. That said, my limited experience at least tells me that Python's async has a FUX that leaves a lot to be desired. I've found other languages make it a lot easier to do concurrency, parallelism, async vs sync, blocking vs non-blocking and all that. I don't really know if the issue is poor documentation, the semantics of the APIs themselves...really not sure. I do know that I'm left with the feeling that "this seems like it's going to be a pretty big PITA".
[+] matsemann|3 years ago|reply
Asyncio is annoying, and often unexpectedly slow. You think things are parallel, but then one misbehaving coroutine can hog your cpu bringing everything to a halt. GIL makes it useless for anything other than _heavily_ IO bound tasks.

And yeah, Python's documentation is useless. Never how to use stuff, only listing of everything that's possible to do / the API. Unfortunately that style is being mimicked by most other Python projects as well.

[+] andrewstuart|3 years ago|reply
It IS easy.

The Python project just don’t make it obvious how to do it easy.

[+] samsquire|3 years ago|reply
I hope to learn how to use async code more effectively. Coroutines are very interesting. Structured concurrency is very useful in defining understanding concurrency. I wrote a multithreaded userspace scheduler in Java C and Rust which multiplexes lightweight threads over kernel threads. It is a 1:M:N scheduler with 1 scheduler threads M kernel threads and N lightweight threads. This is similar to golang which is P:M:N

https://GitHub.com/samsquire/preemptible-thread

I am deeply interested in parallel and asychronous code. I write about it on my journal (link in my profile)

I am curious if anybody has any ideas on how you would build an interpreter that is multithreaded - with each interpreter running in its own thread and sending objects between threads is done without copying or marshalling. I I think Java does it but I am yet to ask how it does it. Maybe I'll ask Stackoverflow.

I wrote a parallel imaginary assembly interpreter that is backed by an actor framework which can send and receive messages in mailboxes.

Here's some code:

   threads 25
   <start>
   mailbox numbers
   mailbox methods
   set running 1
   set current_thread 0
   set received_value 0
   set current 1
   set increment 1
   :while1
   while running :end
   receive numbers received_value :send
   receivecode methods :send :send
   :send
   add received_value current
   addv current_thread 1
   modulo current_thread 25
   send numbers current_thread increment :while1
   sendcode methods current_thread :print
   endwhile :while1
   jump :end
   :print
   println current
   return
   :end
This is 25 threads that each send integers to eachother as fast as they can. The sendcode instruction can cause the other thread to run some code. It can get up to 1.7 million requests per second without the sendcode and receivecode. With method sending it gets ~600,000 requests per second
[+] jmatthews|3 years ago|reply
Just to swim upstream. For http requests and bounded IO I've found asyncio to be straightforward and a game changer. In the context of call an endpoint with a data payload and have the endpoint process it with outbound http calls it is a 10x'er for very little complexity and no external libraries.
[+] vodou|3 years ago|reply
I've never bothered to learn Python asyncio. When Python 3.5 came out I just thought it looked overly complex. Coming from a C/C++ background on Linux I just use the select package for waiting on blocking I/O, mainly sockets. Do you think there is something to gain for me by learning asyncio?
[+] eestrada|3 years ago|reply
Personally, I don't think there is a benefit. If select is working for you, asyncio doesn't add anything performance wise. It is just meant to look more synchronous in how you write the code. But, using select and either throwing work onto a background thread or doing work quickly (if it isn't CPU bound) can be just as clear to read, if not clearer. Sometimes "async" and "await" calls only obfuscate the logic more.
[+] mpeg|3 years ago|reply
In case the author reads this, there is an error in the "How to Execute a Blocking I/O or CPU-bound Function in Asyncio?" [0]

It reads:

> The asyncio.to_thread() function creates a ThreadPoolExecutor behind the scenes to execute blocking calls.

> As such, the asyncio.to_thread() function is only appropriate for IO-bound tasks.

It should say it's only appropriate for CPU-bound tasks.

[0]: https://superfastpython.com/python-asyncio/#How_to_Execute_a...

[+] wrigby|3 years ago|reply
I think the article is correct here, actually - if you need to run a CPU-bound task, you’ll need a ProcessPoolExecutor.
[+] dekhn|3 years ago|reply
I've used python since 1995 and I can say that async is one of the worst things I've seen put into python since then. I've used a wide range of frameworks (twisted, gevent, etc) as well as threads and even if async is a good solution (I don't think it is) it broke awscli for quite some time (through aiobotocore and related package dependencies). It's too late in the game for long-term breaks like that or any backward-incompatible changes impacting users.
[+] btown|3 years ago|reply
Yep, it's 2022 and gevent is still the only solution for async & high concurrency that Just Works with the entire ecosystem of Python libraries without code changes. There's definitely some compute overhead compared to async, but we save so much developer time having effortless concurrency and never being worried that, say, using a slow third-party API over the web will slow down other requests.
[+] whalesalad|3 years ago|reply
Gevent is the only way I will do async in Python. Everything else ends up being a nightmare, and gevent is a lot more performant than I ever imagine it will be for a given situation.
[+] postultimate|3 years ago|reply
No subinterpreters in 3.11 ?

If we had those, all this async idiocy would go away, no more code colours, no more single-core, even better isolation and protection against context-switching tangles.

(the implementation not so nice though. Python threads on machine threads ? Whose stupid idea was that ?)

[+] est|3 years ago|reply
Missing yet very important topics: redis & db drivers in async. Or even async ORM.
[+] m3047|3 years ago|reply
I do Redis in a thread pool. Did this before aioredis came out, have kept doing it because I know it and it works. Have used the same pattern for a few other things it works so well.
[+] ltbarcly3|3 years ago|reply
Python asyncio is pretty awful. The libraries are of extremely poor quality, and the slightest mistake can lead to blocking the event loop. After a few years of dealing with it I refuse to continue and am just using threads.