top | item 46892567

(no title)

reilly3000 | 25 days ago

Which takes a $20k thunderbolt cluster of 2 512GB RAM Mac Studio Ultras to run at full quality…

discuss

order

0xbadcafebee|25 days ago

Most benchmarks show very little improvement of "full quality" over a quantized lower-bit model. You can shrink the model to a fraction of its "full" size and get 92-95% same performance, with less VRAM use.

MuffinFlavored|25 days ago

> You can shrink the model to a fraction of its "full" size and get 92-95% same performance, with less VRAM use.

Are there a lot of options how "how far" do you quantize? How much VRAM does it take to get the 92-95% you are speaking of?

polynomial|25 days ago

Depending on what your usage requirements are, Mac Minis running UMA over RDMA is becoming a feasible option. At roughly 1/10 of the cost you're getting much much more than 1/10 the performance. (YMMV)

https://buildai.substack.com/i/181542049/the-mac-mini-moment

danw1979|24 days ago

I did not expect this to be a limiting factor in the mac mini RDMA setup ! -

> Thermal throttling: Thunderbolt 5 cables get hot under sustained 15GB/s load. After 10 minutes, bandwidth drops to 12GB/s. After 20 minutes, 10GB/s. Your 5.36 tokens/sec becomes 4.1 tokens/sec. Active cooling on cables helps but you’re fighting physics.

Thermal throttling of network cables is a new thing to me…

deaux|25 days ago

And that's at unusable speeds - it takes about triple that amount to run it decently fast at int4.

Now as the other replies say, you should very likely run a quantized version anyway.

bigyabai|25 days ago

"Full quality" being a relative assessment, here. You're still deeply compute constrained, that machine would crawl at longer contexts.

PlatoIsADisease|25 days ago

[deleted]

zozbot234|25 days ago

70B dense models are way behind SOTA. Even the aforementioned Kimi 2.5 has fewer active parameters than that, and then quantized at int4. We're at a point where some near-frontier models may run out of the box on Mac Mini-grade hardware, with perhaps no real need to even upgrade to the Mac Studio.

sealeck|25 days ago

Are you an NVIDIA fanboy?

This is a _remarkably_ aggressive comment!

teaearlgraycold|25 days ago

Which while expensive is dirt cheap compared to a comparable NVidia or AMD system.

SchemaLoad|25 days ago

It's still very expensive compared to using the hosted models which are currently massively subsidised. Have to wonder what the fair market price for these hosted models will be after the free money dries up.

blharr|25 days ago

What speed are you getting at that level of hardware though?