I was baffled by the comparison to the M4 Max. Does this mean that recent AMD chips will be performing at the same level, and what does that mean for on-device LLMs? .. or am I misunderstanding this whole ordeal?
I was surprised at previous comparison on omarchy website, because apple m* work really well for data science work that don't require GPU.
It may be explained by integer vs float performance, though I am too lazy to investigate. A weak data point, using a matrix product of N=6000 matrix by itself on numpy:
- SER 8 8745, linux: 280 ms -> 1.53 Tflops (single prec)
- my m2 macbook air: it is ~180ms ms -> ~2.4 Tflops (single prec)
This is 2 mins of benchmarking on the computers I have. It is not apple to orange comparison (e.g. I use the numpy default blas on each platform), but not completely irrelevant to what people will do w/o much effort. And floating point is what matters for LLM, not integer computation (which is what the ruby test suite is most likely bottlenecked by)
Apple M chips are slower on the computation that AMD chips, but they have soldered on-package fast ram with a wide memory interface, which is very useful on workloads that handle lots of data.
Strix halo has a 256-bit LPDDR5X interface, twice as wide as the typical desktop chip, roughly equal to the M4 Pro and half of that of the M4 Max.
You're most likely bottlenecked by memory bandwidth for a LLM.
The AMD AI MAX 395+ gives you 256GB/sec. The M4 gives you 120GB/s, and the M4 Pro gives you 273GB/s. The M4 Max: 410GB/s (14‑core CPU/32‑core GPU) or 546GB/s (16‑core CPU/40‑core GPU).
I think DHH compares them because they are both the latest, top-line chips. I think DHHs benchmarks show that they have different performance characteristics. But DHHs favorite benchmark favors whatever runs native linux and docker.
For local LLM the higher memory bandwith of M4 Max makes it much more performant.
An M4 Max has double the memory bandwidth and should run away with similarly optimized benchmarks.
An M4 Pro is the more appropriate comparison. I don't know why he's doing price comparisons to a Mac Studio when you can get a 64GB M4 Pro Mac Mini (the closest price/performance comparison point) for much less.
> don't know why he's doing price comparisons to a Mac Studio when you can get a 64GB M4 Pro Mac Mini (the closest price/performance comparison point) for much less.
Where?
An M4 Pro Mac Mini is priced higher than the Framework here in Canada...
Depends on the benchmark I think. In this case it's probably close. Apple is cagey when it comes to power draw or clock metrics but I believe the M4 max has been seen drawing around 50W in loaded scenarios. Meanwhile, Phoronix clocked the 395+ as drawing an average of 91 watts during their benchmarks. If the performance is ~twice as fast that should be a similar performance per watt. Needless to say it's at least not a dramatic difference the way it was when the M1 came out.
edit: Though the M4 Max may be more power hungry than I'm giving it credit, but it's hard to say because I can't figure out if some of these power draw metrics from random Internet posts actually isolate the M4 itself. It looks like when the GPU is loaded it goes much, much higher.
It's not baffling once you realize TSMC is the main defining factor for all these chips, Apple Silicon is simply not that special in the grand scheme of things.
Why do you think TSMC's production being in Taiwan is basically a national security issue for the U.S. at this point?
> Apple Silicon is simply not that special in the grand scheme of things
Apple Silicon might not be that special from an architecture perspective (although treating integrated GPUs as appropriate for workloads other than low end laptops was a break with industry trends), but it’s very special from an economic perspective. The Apple Silicon unit volumes from iPhones have financed TSMC’s rise to semiconductor process dominance and, it would appear, permanently dethroned Intel.
izacus|6 months ago
That results in significantly better performance.
sidewndr46|6 months ago
schmorptron|6 months ago
cdavid|6 months ago
It may be explained by integer vs float performance, though I am too lazy to investigate. A weak data point, using a matrix product of N=6000 matrix by itself on numpy:
This is 2 mins of benchmarking on the computers I have. It is not apple to orange comparison (e.g. I use the numpy default blas on each platform), but not completely irrelevant to what people will do w/o much effort. And floating point is what matters for LLM, not integer computation (which is what the ruby test suite is most likely bottlenecked by)Tuna-Fish|6 months ago
Apple M chips are slower on the computation that AMD chips, but they have soldered on-package fast ram with a wide memory interface, which is very useful on workloads that handle lots of data.
Strix halo has a 256-bit LPDDR5X interface, twice as wide as the typical desktop chip, roughly equal to the M4 Pro and half of that of the M4 Max.
jychang|6 months ago
The AMD AI MAX 395+ gives you 256GB/sec. The M4 gives you 120GB/s, and the M4 Pro gives you 273GB/s. The M4 Max: 410GB/s (14‑core CPU/32‑core GPU) or 546GB/s (16‑core CPU/40‑core GPU).
biehl|6 months ago
For local LLM the higher memory bandwith of M4 Max makes it much more performant.
Arstechnica has more benchmarks for non-llm things https://arstechnica.com/gadgets/2025/08/review-framework-des...
rr808|6 months ago
Aurornis|6 months ago
An M4 Pro is the more appropriate comparison. I don't know why he's doing price comparisons to a Mac Studio when you can get a 64GB M4 Pro Mac Mini (the closest price/performance comparison point) for much less.
dismalaf|6 months ago
Where?
An M4 Pro Mac Mini is priced higher than the Framework here in Canada...
discordance|6 months ago
jchw|6 months ago
edit: Though the M4 Max may be more power hungry than I'm giving it credit, but it's hard to say because I can't figure out if some of these power draw metrics from random Internet posts actually isolate the M4 itself. It looks like when the GPU is loaded it goes much, much higher.
https://old.reddit.com/r/macbookpro/comments/1hkhtpp/m4_max_...
ekianjo|6 months ago
pengaru|6 months ago
Why do you think TSMC's production being in Taiwan is basically a national security issue for the U.S. at this point?
toasterlovin|6 months ago
Apple Silicon might not be that special from an architecture perspective (although treating integrated GPUs as appropriate for workloads other than low end laptops was a break with industry trends), but it’s very special from an economic perspective. The Apple Silicon unit volumes from iPhones have financed TSMC’s rise to semiconductor process dominance and, it would appear, permanently dethroned Intel.
ozgrakkurt|6 months ago
I hate apple but there is obviously something special about it