While Apple M* chips seems to have an incredible unified memory access, the available learning resources seem to be quite restricted and often convoluted. Has anyone been able to get past this barrier?
I have some familiarity with general purpose software development with CUDA and C++. I want to figure how to work with/ use Apple's developer resources for general purpose programming.
aleinin|1 year ago
[1] https://github.com/srush/GPU-Puzzles
[2] https://github.com/abeleinin/Metal-Puzzles
dylan604|1 year ago
singlepaynews|1 year ago
morphle|1 year ago
[1] https://github.com/AsahiLinux/gpu
[2] https://github.com/dougallj/applegpu
[3] https://github.com/antgroup-skyward/ANETools/tree/main/ANEDi...
[4] https://github.com/hollance/neural-engine
You can use a high level APIs like MLX, Metal or CoreML to compute other things on the GPU and NPU.
Shadama [5] is an example programming language that translates (with Ometa) matrix calculations into WebGPU or WebGL APIs (I forget which). You can do exactly the same with the MLX, Metal or CoreML APIs and only pay around 3% overhead going through the translation stages.
[5] https://github.com/yoshikiohshima/Shadama
I estimate it will cost around $22K at my hourly rate to completely reverse engineer the latest A16 and M4 CPU (ARMV9), GPU and NPU instruction sets. I think I am halfway on the reverse engineering, the debugging part is the hardest problem. You would however not be able to sell software with it on the APP Store as Apple forbids undocumented API's or bare metal instructions.
MuffinFlavored|1 year ago
Very interesting. A steal for $22k but I guess very niche for now...
JackYoustra|1 year ago
dgfitz|1 year ago
KeplerBoy|1 year ago
My use case would be hooking up a device which spews out sensor data at 100 gbit/s over qsfp28 ethernet as directly to a GPU as possible. The new mac mini has the GPU power, but there's no way to get the data into it.
# 2x Gen4x16 + 4x Gen3x8 = 2 * 31.508 GB/s + 4 * 7.877 GB/s ≈ 90 GB/s = 720 gbit/s
barkingcat|1 year ago
There is Metal development. You want to learn Apple M-series gpu and gpgpu development? Learn Metal!
https://developer.apple.com/metal/
kristianp|1 year ago
That's what GPGPU stands for. So your 2 sentences contradict each other.
rgovostes|1 year ago
<Insert your favorite LLM> helped me write some simple Metal-accelerated code by scaffolding the compute pipeline, which took most of the nuisance out of learning the API and let me focus on writing the kernel code.
Here's the code if it's helpful at all. https://github.com/rgov/thps-crack
nixpulvis|1 year ago
billti|1 year ago
With that base, I’ve found their docs decent enough, especially coupled with the Metal Shader Language pdf they provide (https://developer.apple.com/metal/Metal-Shading-Language-Spe...), and quite a few code samples you can download from the docs site (e.g. https://developer.apple.com/documentation/metal/performing_c...).
I’d note a lot of their stuff was still written in Objective-C, which I’m not that familiar with. But most of that is boilerplate and the rest is largely C/C++ based (including the Metal shader language).
I just ported some CPU/SIMD number crunching (complex matrices) to Metal, and the speed up has been staggering. What used to take days now takes minutes. It is the hottest my M3 MacBook has ever been though! (See https://x.com/billticehurst/status/1871375773413876089 :-)
mkagenius|1 year ago
1. https://ml-explore.github.io/mlx/build/html/index.html
thetwentyone|1 year ago
Archit3ch|1 year ago
dylanowen|1 year ago
rudedogg|1 year ago
Metal, and Apple's docs are the place to start.
grovesNL|1 year ago
There is also a Vulkan backend if you want to run Vulkan through MoltenVK though.
feznyng|1 year ago
desideratum|1 year ago
rowanG077|1 year ago
TriangleEdge|1 year ago
nox101|1 year ago
if you want portable use WebGPU either via wgpu for rust or dawn for C++ They actually do run on Windows, Linux, Mac, iOS, and Android portably
unknown|1 year ago
[deleted]
amelius|1 year ago
saagarjha|1 year ago
codr7|1 year ago