An appeal to Apple from Anukari

Some folks may have seen my Show HN post for Anukari here: https://news.ycombinator.com/item?id=43873074

In that thread, the topic of macOS performance came up there. Basically Anukari works great for most people on Apple silicon, including base-model M1 hardware. I've done all my testing on a base M1 and it works wonderfully. The hardware is incredible.

But to make it work, I had to implement an unholy abomination of a workaround to get macOS to increase the GPU clock rate for the audio processing to be fast enough. The normal heuristics that macOS uses for the GPU performance state don't understand the weird Anukari workload.

Anyway, I finally had time to write down the full situation, in terrible detail, so that I could ask for help getting in touch with the right person at Apple, probably someone who works on the Metal API.

Help! :)

bambax|10 months ago

> This is going to be a VERY LONG HIGHLY TECHNICAL post, so either buckle your seatbelt or leave while you still can.

Well, I read it all and found it not too long, extremely clear and well-written, and informative! Congrats on the writing.

I've never owned a Mac and my pc is old and without a serious GPU, so it's unlikely that I'll get to use Anukari soon, but I regret it very much, as it looks sooo incredibly cool.

Hope this gets resolved fast!

my123|10 months ago

Did you try this entitlement? https://developer.apple.com/documentation/bundleresources/en...

wonder if com.apple.developer.sustained-execution also goes the other way around...

vlovich123|10 months ago

Interesting post & problem. I wonder if the reason that the idea of running the tasks on the same queue fails is for the same reason you have a problem in the first place - variable clock rate means it’s impossible to schedule precisely and you end up aliasing your spin stop time ideal time based on how the OS decided to clock the GPU. But that suggests that maybe your spin job isn’t complex enough to run the GPU at the highest clock because if it is running at max then you should be able to reliably time the stop of the spin even without adding a software PLL (which may not be a bad idea). I didn’t see a detailed explanation of how the spin is implemented and I suspect a more thorough spin loop that consistently drives more of the GPU might be more effective at keeping the clock rate at max perf.

TheAceOfHearts|10 months ago

I missed the Show HN, but the first thing that came to mind after seeing it was that this looks like it would lend itself well to making some very creative ASMR soundscapes with immersive multidimensional audio. I selfishly hope you or one of your users will make a demo. Congrats on the project and I hope you receive help on your Apple issues.

sunshowers|10 months ago

Great post, I found the description clear and easy to understand. I've definitely run into the issue you're describing in other contexts.

Dlemo|10 months ago

[deleted]

aplummer|10 months ago

Have you filed a feedback? Seems like the right next step.

humbledrone|10 months ago

Hey everyone, it worked, I had a super productive conversation with exactly the right person on the Metal team! Thanks for helping me get Apple's attention. I didn't at all expect this amount of support.

https://anukari.com/blog/devlog/productive-conversation-appl...

krackers|10 months ago

>While I can't share any technical details... The engineer provided some suggestions and hints that I can use right now to maybe — just maybe — get things working in the short term

Great that you have a workaround now, but the fact that you can't even share what the workaround is, ironically speaks to the last line in https://news.ycombinator.com/item?id=43904921 of how Apple communicates

>there’s this trick of setting it to this but then change to that and it’ll work. Undocumented but now you know

When you do implement the workaround, maybe you could do it in an overtly-named function spottable via disassembly so that others facing similar constraints of latency-sensitive GPU have some lead as to the magic incantation to use?

mschuster91|10 months ago

Once again, HN has fulfilled its true purpose: cutting through the red tape that is placed in the front of every large corporation's customer support.

Congratulations and good luck with your project!

AJRF|10 months ago

I’ve worked in two high profile companies with very prominent apps on the Apple App Store.

The team we talked to at Apple never ever cared about our problems, but very often invited us to their office to discuss the latest feature they were going to announce at WWDC to strong arm us into supporting it. That was always the start and stop of their engagement with us. We had to burn technical support tickets to ever get any insight into why their buggy software wasn’t working.

Apples dev relations are not serious people.

waffletower|10 months ago

I am glad that your experience is not the rule, as the OP reveals above. However, I worked for a company about 10 years ago with a fairly prominent app. An update that came out that absolutely destroyed the performance of it. At the precisely the same time, a competitor launched an app which did not have the performance difficulty. It turned out that the developer of the competing app had recently left Apple, and left an undocumented surprise in Apple's video drivers that broke it. It took disassembling the competitors binary to find the undocumented change and repair our application. The developer also taunted our CEO by email. Nice world we live in.

krackers|10 months ago

>The Metal profiler has an incredibly useful feature: it allows you to choose the Metal “Performance State” while profiling the application. This is not configurable outside of the profiler.

Seems like there might be a private API for this. Maybe it's easier to go the reverse engineering route? Unless it'll end up requiring some special entitlement that you can't bypass without disabling SIP.

bambax|10 months ago

There has to be a private API for this; the post says:

> The Metal profiler has an incredibly useful feature: it allows you to choose the Metal “Performance State” while profiling the application. This is not configurable outside of the profiler.

How would the Metal profiler be able to do that if not for a private API? (Could some debugging tool find out what's going on by watching the profiler?)

LiamPowell|10 months ago

The problem with exposing an API for this is that far too many developers will force the highest performance state all the time. I don't know if there's really a good way to stop that and have the API at the same time.

grishka|10 months ago

There already is an unending number of ways for just one app to waste charge on battery-powered devices. It all already relies on developers not unnecessarily running energy-intensive tasks, either intentionally or accidentally. Adding one more API that has the potential to waste energy if not used appropriately will not change that.

JimDabell|10 months ago

The article mentions game mode, which is a feature of the latest Apple operating systems that is optimised for cases like this. Game mode pops up a notification when it’s enabled, which most applications wouldn’t want to happen. So far I haven’t seen anything abuse it.

duped|10 months ago

Developers aren't (yet) abusing audio workgroups for all their thread pools to get pcore scheduling and higher priority. So it would imply that if an audio workgroup is issuing commands to the GPU there should be some kind of timeout to the GPU downclocking based on the last time a workgroup sent data to it.

GPU audio is extremely niche these days, but with the company mentioned in TFA releasing their SDK recently it may become more popular. Although I don't buy it because if you're doing thing on GPU you're saying you don't care about latency, so bump your i/o buffer sizes.

zamadatix|10 months ago

Abusing the API would still be more efficient than running fake busy workloads to do the same, which apps can already fo without the API (or permissions the API could require).

nottorp|10 months ago

Manual permission? Maybe hidden somewhere, it's probably necessary for very niche apps.

And default deny at the OS level for Zoom, Teams and web browsers :)

Cthulhu_|10 months ago

But as the author mentions, they already do it by having a process spin indefinitely. If they want to abuse it, they will and can already.

It's better to trust, the amount of people that won't abuse it far outweigh the ones that do.

threeseed|10 months ago

Best way to do this:

1. Go through WWDC videos and find the engineer who seems the most knowledgable about the issue you're facing.

2. Email them directly with this format: mthomson@apple.com for Michael Thomson.

Hnrobert42|10 months ago

Or his brother Pichael at pthomson.

vessenes|10 months ago

Side note: Anukari should put out a Mick Gordon sound pack and share revs with him. That dude is making some crazy crazy stuff; his demo is awesome. Pairing up with artists once you have such a strong tool is good business and good for the world. If you like Mick Gordon. Which I do.

sgt|10 months ago

I have zero need for this app but it's so cool. Apps like these bring the "fun" back into computing. I don't mean there's no fun at the moment, but reminds me of the old days with more graphical and experimental programs that floated around, even the demoscene.

philsnow|10 months ago

Don't miss the link thrown in the second to last paragraph to https://x.com/Mick_Gordon/status/1918146487948919222 , a demo Mick Gordon put together, to which @anukarimusic replied

> Lol on the second day it's out, you have already absolutely demolished all of the demos I've made with it and I've used it every day for two years

phkahler|10 months ago

1024 objects updating at 48khz seems possible on the CPU - depending how the code is written. 48M updates per second? It seems like a possible use for OpenMP to run a few loops in parallel across cores.

humbledrone|10 months ago

1. Anukari runs up to 16 entire copies of the physics model for polyphony, so 16 * 1024 * 48K (I should update the blog post)

2. Users can arbitrarily connect objects to one another, so each object has to read connections and do processing for N other entities

3. Using the full CPU requires synchronization across cores at each physics step, which is slow

4. Processing per object is relatively large, lots of transcendentals (approx OK) but also just a lot of features, every parameter can be modulated, needs to be NaN-proof, so on

5. Users want to run multiple copies of Anukari in parallel for multiple tracks, effects, etc

Another way to look at it is: 4 GHz / (16 voice * 1024 obj * 4 connections * 48,000 sample) = 1.3 cycles per thing

The GPU eats this workload alive, it's absolutely perfect for it. All 16 voice * 1024 obj can be done fully in parallel, with trivial synchronization at each step and user-managed L1 cache.

cfstras|10 months ago

If my math is right, that gives you 83 clock cycles to calculate a single sample. on a 16 core, theoretically 1333 cycles. that‘s not a lot, considering you don‘t nearly 100% of the cpu all the time.

Someone|10 months ago

One thing I don’t understand: if latency is important for this use case, why isn’t the CPU busy preparing the next GPU ‘job’ while a GPU ‘job’ is running?

Is that a limitation of the audio plug-in APIs?

humbledrone|10 months ago

I attempted to preempt your question in the section of my blog post, "Why don’t you just pipeline the GPU code so that it saturates the GPU?" It's one of the less-detailed sections though so maybe you have further questions? I think the main thing is that since Anukari processes input like MIDI and audio data in real-time, it can't work ahead of the CPU, because those inputs are not available yet.

Possibly what you describe is a bit more like double-buffering, which I also explored. The problem here is latency: any form of N-buffering introduces additional latency. This is one reason why some gamers don't like triple-buffering for graphics, because it introduces further latency between their mouse inputs and the visual change.

But furthermore, when the GPU clock rate is too low, double-buffering or pipelining don't help anyway, because fundamentally Anukari has to keep up with real time, and every block it processes is dependent on the previous one. With a fully-lowered GPU clock, the issue does actually become one of throughput and not just latency.

kllrnohj|10 months ago

That's pipelining and it's good for throughput but it sacrifices latency. Audio is not a continuous bit stream but a series of small packets. To begin working on the next one on the CPU while the previous one is on the GPU requires 2 samples in flight which necessarily means higher latency

grandinj|10 months ago

this might trick the heuristics in the right direction ie. feed the GPU a bunch of small tasks (i.e. with a small number of samples) instead of big tasks.

mort96|10 months ago

I mean the CPU can't prepare a job for samples which don't exist yet. If it takes 0.5 milliseconds to process 1 millisecond's worth of audio, you'll necessarily be stopping and starting constantly. You can't keep the GPU fed continuously.

jonas21|10 months ago

I'm having trouble understanding what the problem is -- as in, what are the actual symptoms that users are seeing? How much latency can the app tolerate and how much are you seeing in practice? It would be helpful (to me at least) in thinking about potential solutions if that information were available up front.

Perhaps there's something in this video that might help you? They made a lot of changes to scheduling and resource allocation in the M3 generation:

https://developer.apple.com/videos/play/tech-talks/111375/

humbledrone|10 months ago

It's a real-time audio app, so if it falls behind real time, no audio. You get cracks, pops, and the whole thing becomes unusable. If the user is doing audio at 48 kHz, the required latency is 1/48,000 seconds per sample, or realistically somewhat less than that to account for variance and overhead.

charcircuit|10 months ago

>Any MTLCommandQueue managed by an Audio Workgroup thread could be treated as real-time and the GPU clock could be adjusted accordingly.

>The Metal API could simply provide an option on MTLCommandQueue to indicate that it is real-time sensitive, and the clock for the GPU chiplet handling that queue could be adjusted accordingly.

Realtime scheduling on a GPU and what the GPU is clocked to are separate concepts. From the article it sounds like the issue is with the clock speeds and not how the work is being scheduled. It sounds like you need something else for providing a hint for requesting a higher GPU clock.

dgs_sgd|10 months ago

> in parallel with the audio computation on the GPU, Anukari runs a second workload on the GPU that is designed to create a high load average and trick macOS into clocking up the GPU. This workload is tuned to use as little of the GPU as possible, while still creating a big enough artificial load to trigger the clock heuristics.

That's quite the hack and I feel for the developers. As they state in the post, audio on the GPU is really new and I sadly wouldn't be holding my breath for Apple to cater to it.

PaulHoule|10 months ago

It's an interesting trade-off. For decades the answer to having a reliable Windows computer has been to turn off as many power saving features as possible. Saving power on USB plugs for instance makes your machine crash. Let your CPU state drop to the minimum and you'll find your $3000 desktop computer takes about a second to respond to keypresses. Power savings might not be real, but the crashes and poor performance are very real.

chrismorgan|9 months ago

> (An aside: chalkboards are way better than whiteboards, unless you enjoy getting high on noxious fumes. in which case whiteboards are the way to go.)

That looks to be a smoother chalkboard than I’ve ever encountered. If I had been using such chalkboards, I suspect I’d agree, but based purely on my experiences to this point, my opinion has been that chalkboards are significantly better for most art due to finer control and easier and more flexible editing, but whiteboards are better for most teaching purposes (in small or large groups), mostly due to higher contrast. But there’s a lot of variance within both, and placement angles and reflection characteristics matter a lot, as do the specific chalk, markers and ink you use.

ramesh31|10 months ago

Be careful what you wish for here. Knowing Apple, they will stonewall any API requests, and may very well shut your app out for the private API workarounds described.

mort96|10 months ago

I don't think Anukari is in the Mac App Store, nor do I think a plug-in like it will ever be appropriate for the App Store, so I don't know what exactly you're worried about.

rock_artist|10 months ago

While very different, it was already tricky in the past to make Apple silicon (on iPhones as well) perform reasonable.

Ableton engineers already evaluated this in the past: https://github.com/Ableton/AudioPerfLab

While I feel for the complaints about the Apple lack of "feedback assiting" The core issue itself is very tricky. Many years ago, before being an audio developer, I've worked in a Pro Audio PC shop...

And guess what... interrupts, abusive drivers (GPUs included) and Intels SpeedStep, Sleep states, parking cores... all were tricky.

Fast forward, We got asymmetric CPUs, arm64 CPUs and still Intel or AMDs (especially laptops) might need bios tweaks to avoid dropouts/stutters.

But if there's a broken driver by CPU or GPU... good luck reporting that one :)

notnullorvoid|10 months ago

Sorry to hear about the issue, not too surprising given Apples track record with this kind of thing though (You still can't even pin processes to specific CPU core/threads). Anukari is really cool though, wish you had a Linux build :)

thraway3837|10 months ago

This is all just too much Stockholm syndrome. Apple’s DX (developer experience) has always been utterly abysmal, and these continued blog posts just goes to show just how bad it is.

Proprietary technologies, poor or no documentation, silent deprecations and removals of APIs, slow trickle feed of yearly WWDC releases that enable just a bit more functionality, introducing newer more entrenched ways to do stuff but still never allowing the basics that every other developer platform has made possible on day 1.

A broken UI system that is confusing and quickly becomes undebuggable once you do anything complex. Replaces Autolayout but over a decade of apps have to transition over. Combine framework? Is it dead? Is it alive? Networking APIs that require the use of a 3rd party library because the native APIs don’t even handle the basics easily. Core data a complete mess of a local storage system, still not thread safe. Xcode. The only IDE forced on you by Apple while possibly being the worst rated app on the store. Every update is a nearly 1 hour process of unxipping (yes, .xip) that needs verification and if you skip it, you could potentially have bad actors code inject into your application from within a bad copy of Xcode unbeknownst to you. And it crashes all the time. Swift? Ha. Unused everywhere else but Apple platforms. Swift on server is dead. IBM pulled out over 5 years ago and no one wants to use Swift anywhere but Apple because it’s required.

The list goes on. Yet, Apple developers love to be abused by corporate. Ever talk to DTS or their 1-1 WWDC sessions? It’s some of the most condescending, out of touch experience. “You have to use our API this way, and there’s this trick of setting it to this but then change to that and it’ll work. Undocumented but now you know!”

Just leave the platform and make it work cross platform. That’s the only way Apple will ever learn that people don’t want to put up with their nonsense.

duped|10 months ago

I don't disagree with you, but there simply isn't an alternative for pro audio developers. You go where the users are and the majority of the market (by revenue) are Mac users.

Now a lot of people may reply to this that Windows isn't that bad with ASIO (third party driver framework) or modern APIs like WASAPI (which is still lacking), or how pipewire is changing things on Linux so you don't need jack anymore (but god forbid, you want to write pipewire native software in a language besides C, since the only documented API are macros). Despite these changes you have to go where the revenue is, which is on MacOS.

fxtentacle|10 months ago

The Apple DX used to be pretty great around 2010. But by now, it's laughably bad. With every additional OS update, they asked for more and more work (and expensive EV signing certificates) to keep our pro audio app working, which is why it has since been abandoned.

In fact, I'm now working on a USB hardware replacement for what used to be a macOS app, simply because Apple isn't allowing enough control anymore. Their DX has degraded to the point where delivering the features as an app has become impossible.

Also, USB gadgets are exempt from the 30% app store tax. You can even sell them with recurring subscriptions through your own payment methods. Both for the business owner and for the developer, sidestepping Apple is better than jumping through their ridiculous hoops.

galad87|10 months ago

It's surely not perfect, and so much is quite horrible, but at least try to keep the facts in check. AppKit and auto layout are still working fine, they aren't going anywhere any time soon, there is no need to rewrite all the UI code.

Core Data threading? Well, it has got its pitfalls, but those are known, and anyway, nothing is forcing you to use it.

Xcode is so slim these days, it a ~3 GB download, it doesn't take an hour to unxip, and it can be dowloaded from the developer website.

Swift? It might be needed for a bunch of new frameworks, Objective-C isn't going anywhere anytime soon either.

pjmlp|10 months ago

On the old Mac OS, and early OS X day, the documentation was great, I dunno what happened to the documentation team.

Swift on the server is for Apple ecosystem developers, to share code, just like all those reasons to apparently use JavaScript on the server instead of something saner.

jmull|10 months ago

> Stockholm syndrome

I don't think that's apt. What you find to be "abuse" others might find to be the kind of obstacles/issues that every platform/ecosystem has.

It probably helps if you never put Apple on a pedestal in the first place, so there's no special disappointment when they inevitably turn out to be imperfect. E.g., just because Apple publishes a new API/framework, that doesn't mean you need to jump on board and use it.

Anyway, developers are adults who can make their own judgements about whether it's worth it to work in Apple's ecosystem or not. It sounds like you've made your decision. Now let everyone else make theirs.

HelloImSteven|10 months ago

Apple's documentation used to be quite good—many useful guides, thousands of technical notes, development books quarterly—it's really a shame that they've turned their back on that. Their old docs leaned toward being overly detailed, which some complained about at the time, but I'd much prefer that over near radio silence.

Apple's also been deleting more and more of its old documentation. Much of the it can only be found on aging DVDs now, or web/FTP archives if you're lucky. Even more annoying is how some of the deleted docs are _still_ referenced modern docs and code samples.

EasyMark|10 months ago

lol and people wonder why devs like using electron front ends for their back end code, despite the memory cost. I only have so much time in the day, so I'm doing a lot of calling of c++ backend code to display my data analysis output and configuration on Electron front end. I may look into wasm someday, when that mythical "extra time" comes around

BonoboIO|10 months ago

I'm constantly amazed how developers worship Apple while Apple couldn't care less about them. Bugs that never get fixed, documentation that's incomplete, wrong or non-existent, and their bug tracking is a complete joke.

eigenspace|10 months ago

It's honestly nuts that so many developers continue to try to make software on MacOS. I understand the appeal of their current hardware, and I used to even be a big fan of the user experience, but it really seems like attempting to build software in MacOS is like trying to build a house on a sandbar.

Apple has done nothing and continues to do nothing to engender any confidence in their platform as a development target.

unknown|10 months ago

[deleted]

favorited|10 months ago

> Networking APIs that require the use of a 3rd party library because the native APIs don’t even handle the basics easily

This is nonsense. I've been a professional Mac and iOS developer for well over a decade, and even in the days of NSURLConnection, I've never needed a 3rd party networking library. Uploading, downloading, streaming, proxying, caching, cookies, auth challenges, certificate validation, mTLS, HTTP/3, etc. – it's all available out of the box.

ryandrake|10 months ago

I think people forget how horrible embedded (including mobile) programming was before the iPhone SDK came along. I just posted about this[1], so I won't repeat myself here, but TLDR: developing for the iPhone is a breath of fresh air compared to how unnecessarily difficult embedded and mobile development was (and in many cases still is) on other platforms.

1: https://news.ycombinator.com/item?id=43657086

interactivecode|10 months ago

Oh please, every platform and programming environment has undocumented apis, workarounds and hacks.

Liftyee|10 months ago

Out of curiosity, what's the origin of the Anukari name?

SOLAR_FIELDS|10 months ago

https://xkcd.com/1172/ feels a lot like the workaround OP describes

rollcat|10 months ago

That's more like "I had to trick the OS into thinking that spacebar was held for my application to run at all".

throwaway48476|10 months ago

Another 'appeal to the tsar'?

putpoointheloo|10 months ago