Lessons learnt building a real-time audio application in Python

camtarn|1 year ago

"It turns out that the round-trip time from an audio interface, through a computer (DAW) and back to the speakers takes a few hundreds of milliseconds, making direct audio processing impossible using consumer hardware." - uh, what? Real-time audio processing has been a thing for at least a couple of decades. It doesn't work by default on Windows, but you can get free drivers (ASIO4All) which make it work on pretty much any hardware. And it works out of the box on Macs.

"Latency seems to shift by a few tens of milliseconds when restarting the application." - this makes me think you are using the wrong API for your sound input/output. With modern realtime audio support, your total latency from input to output should be less than 10ms total.

"I expected that memory usage would get out of hand quite fast due to the ever growing dictionary of arrays containing audio data, but this does not happen in practice. I suspect that the good performance is caused by highly optimized memory management of Python and modern OSes." - without concrete figures it's quite hard to evaluate this, but what did you expect to happen? With a 44.1KHz stereo audio stream, you should be storing 88.2 thousand samples a second. Say you're using 64-bit floats, as a worst case. Your audio storage should be growing at about 689KB/sec, plus a bit extra for object overhead. How much is it actually growing by? Of course Python is probably doing a bunch of allocation and deallocation for temporary objects behind the scenes, but hopefully you should not need to lean too hard on 'highly optimized memory management' - ideally, you should hardly be allocating anything at all. Also, why a dict, rather than just a large array that you can occasionally make bigger?

Finally ... I'm sure you already know that Python is possibly the worst mainstream language you could pick for realtime audio processing. But that is fine. I have tried to build audio stuff in Python too! Sometimes using the wrong tool for the job is part of the fun.

PhunkyPhil|1 year ago

+1. In Ableton on Windows you can get your latency down to ~40ms without a dedicated sound card using ASIO. Mac's drivers are even better with sub ~20 ms on my m2 pro IIRC.

CyberDildonics|1 year ago

When you see someone using python for something as real time and latency sensitive as audio don't you expect more wacky red flags on top of the fact that python is going to 50x to 100x slower than a native program?

Crazy numbers on top of a dictionary of arrays? It's all there.

spmvg|1 year ago

Interesting comment! I'm going to figure out if using another driver allows me to get under 20 ms in latency. Right now I'm measuring around 300 ms in latency round-trip, which is not a problem because I can correct for it. (I'm using a Focusrite Scarlett 2i2 with default drivers.)

The reasoning behind my comment about round-trip time was as follows:

  - Right now I'm measuring around 300 ms round-trip time, without processing inbetween
  - In the past I've tried to do live effects in Ableton with ASIO drivers (guitar in -> Ableton effects -> out), and the delay was too noticable. I couldn't play that way without making my ears bleed and I've switched back to pedals since.

One follow up: how could I achieve a total round-trip latency of around 10 ms total, as you describe? If I use a buffer of 500 samples @ 44.1 kHz, then I am spending already 11 ms just filling the buffer. So then the buffers need to become really small, causing more processing overhead, right? Not sure if this is the way to go.

pjmlp|1 year ago

The damage that Python not having a JIT has done.

At least BASIC was designed for native code compilation from day one, and after the 8 bit home computers generation passed by, getting compilers for 16 bit home computers was rather easy.

30 years later, people insist in using bytecode interpreted language for the wrong use cases.

nicholasjarnold|1 year ago

I was once bitten by not understanding that there is a difference between "regular" clocks and high performance clocks/timers that a developer can take advantage of. At the time I needed a sampling routine to run at precisely once per second. My inexperience led me to go with something like thread.sleep(1000), and I learned quickly that I was mistaken in thinking it'd run with little jitter. As others are pointing out, there are also similar lessons and solutions when dealing with audio processing pipelines.

spmvg|1 year ago

Indeed, it is not a guarantee that the "sleep" will be exactly that long. In the code I'm not "sleeping" in any sensitive places, instead I'm relying on the callback to the audio stream object, which just needs to finish before the next one starts (less of a timing constraint).

unknown|1 year ago

[deleted]

physicsguy|1 year ago

I love Python a lot and I’m first to criticise people complaining about performance when it’s irrelevant (if you’re doing long calculations in NumPy or whatever then the Python overhead is small relative to the hot path).

But as others have said, it’s not a great tool for real time audio since that overhead starts to be a larger % of the time spent. You might want to try compiling parts of it with Cython and using the C API of numpy with that (you can do cimport numpy in your Python code and it’ll remove much of the overhead as it’ll call the C functions directly rather than their wrappers).

spmvg|1 year ago

Cython and related tools will be the direction to take if performance becomes a bottleneck indeed! Interestingly enough, the audio callback is being handled fast enough on my not-impressive laptop and CPU usage has been low on average (<1%), so Python trickery hasn't been needed yet

32 comments