top | item 44853698

(no title)

Kirth | 6 months ago

I was baffled by the comparison to the M4 Max. Does this mean that recent AMD chips will be performing at the same level, and what does that mean for on-device LLMs? .. or am I misunderstanding this whole ordeal?

discuss

order

izacus|6 months ago

Yes, the Strix series of AMD uses a similar architecture as M series with massive memory bandwidth and big caches.

That results in significantly better performance.

sidewndr46|6 months ago

Isn't this the desktop architecture that Torvalds suggested years ago?

schmorptron|6 months ago

Will we be able to get similar bandwidth with socketed ram with CAMM / LPCAMM modules in the near future?

cdavid|6 months ago

I was surprised at previous comparison on omarchy website, because apple m* work really well for data science work that don't require GPU.

It may be explained by integer vs float performance, though I am too lazy to investigate. A weak data point, using a matrix product of N=6000 matrix by itself on numpy:

  - SER 8 8745, linux: 280 ms -> 1.53 Tflops (single prec)
  - my m2 macbook air: it is ~180ms ms -> ~2.4 Tflops (single prec)
This is 2 mins of benchmarking on the computers I have. It is not apple to orange comparison (e.g. I use the numpy default blas on each platform), but not completely irrelevant to what people will do w/o much effort. And floating point is what matters for LLM, not integer computation (which is what the ruby test suite is most likely bottlenecked by)

Tuna-Fish|6 months ago

It's all about the memory bandwidth.

Apple M chips are slower on the computation that AMD chips, but they have soldered on-package fast ram with a wide memory interface, which is very useful on workloads that handle lots of data.

Strix halo has a 256-bit LPDDR5X interface, twice as wide as the typical desktop chip, roughly equal to the M4 Pro and half of that of the M4 Max.

jychang|6 months ago

You're most likely bottlenecked by memory bandwidth for a LLM.

The AMD AI MAX 395+ gives you 256GB/sec. The M4 gives you 120GB/s, and the M4 Pro gives you 273GB/s. The M4 Max: 410GB/s (14‑core CPU/32‑core GPU) or 546GB/s (16‑core CPU/40‑core GPU).

biehl|6 months ago

I think DHH compares them because they are both the latest, top-line chips. I think DHHs benchmarks show that they have different performance characteristics. But DHHs favorite benchmark favors whatever runs native linux and docker.

For local LLM the higher memory bandwith of M4 Max makes it much more performant.

Arstechnica has more benchmarks for non-llm things https://arstechnica.com/gadgets/2025/08/review-framework-des...

rr808|6 months ago

After the appstore fight, DHH's favorite is whatever is not Apple lol. TBF it just opened his eyes to alternatives now is happy off that platform.

Aurornis|6 months ago

An M4 Max has double the memory bandwidth and should run away with similarly optimized benchmarks.

An M4 Pro is the more appropriate comparison. I don't know why he's doing price comparisons to a Mac Studio when you can get a 64GB M4 Pro Mac Mini (the closest price/performance comparison point) for much less.

dismalaf|6 months ago

> don't know why he's doing price comparisons to a Mac Studio when you can get a 64GB M4 Pro Mac Mini (the closest price/performance comparison point) for much less.

Where?

An M4 Pro Mac Mini is priced higher than the Framework here in Canada...

discordance|6 months ago

Not in perf/watt but perf, yes.

jchw|6 months ago

Depends on the benchmark I think. In this case it's probably close. Apple is cagey when it comes to power draw or clock metrics but I believe the M4 max has been seen drawing around 50W in loaded scenarios. Meanwhile, Phoronix clocked the 395+ as drawing an average of 91 watts during their benchmarks. If the performance is ~twice as fast that should be a similar performance per watt. Needless to say it's at least not a dramatic difference the way it was when the M1 came out.

edit: Though the M4 Max may be more power hungry than I'm giving it credit, but it's hard to say because I can't figure out if some of these power draw metrics from random Internet posts actually isolate the M4 itself. It looks like when the GPU is loaded it goes much, much higher.

https://old.reddit.com/r/macbookpro/comments/1hkhtpp/m4_max_...

ekianjo|6 months ago

macs have faster memory access so No, Macs are faster for llms

pengaru|6 months ago

It's not baffling once you realize TSMC is the main defining factor for all these chips, Apple Silicon is simply not that special in the grand scheme of things.

Why do you think TSMC's production being in Taiwan is basically a national security issue for the U.S. at this point?

toasterlovin|6 months ago

> Apple Silicon is simply not that special in the grand scheme of things

Apple Silicon might not be that special from an architecture perspective (although treating integrated GPUs as appropriate for workloads other than low end laptops was a break with industry trends), but it’s very special from an economic perspective. The Apple Silicon unit volumes from iPhones have financed TSMC’s rise to semiconductor process dominance and, it would appear, permanently dethroned Intel.

ozgrakkurt|6 months ago

I don’t think there is a laptop that comes close to battery life or performance while on battery of m1 macbook pro

I hate apple but there is obviously something special about it