(no title)
NiloCK | 18 days ago
I presume here you are referring to running on the device in your lap.
How about a headless linux inference box in the closet / basement?
Return of the home network!
NiloCK | 18 days ago
I presume here you are referring to running on the device in your lap.
How about a headless linux inference box in the closet / basement?
Return of the home network!
Aurornis|18 days ago
It’s possible to build a Linux box that does the same but you’ll be spending a lot more to get there. With Apple, a $500 Mac Mini has memory bandwidth that you just can’t get anywhere else for the price.
cmrdporcupine|18 days ago
And Apple completely overcharges for memory, so.
This is a model you use via a cheap API provider like DeepInfra, or get on their coding plan. It's nice that it will be available as open weights, but not practical for mere mortals to run.
But I can see a large corporation that wants to avoid sending code offsite setting up their own private infra to host it.
ingenieroariel|18 days ago
For our code assistant use cases the local inference on Macs will tend to favor workflows where there is a lot of generation and little reading and this is the opposite of how many of use use Claude Code.
Source: I started getting Mac Studios with max ram as soon as the first llama model was released.
ac29|18 days ago
The cheapest new mac mini is $600 on Apple's US store.
And it has a 128-bit memory interface using LPDDR5X/7500, nothing exotic. The laptop I bought last year for <$500 has roughly the same memory speed and new machines are even faster.
zozbot234|18 days ago
pja|18 days ago
You want the M4 Max (or Ultra) in the Mac Studios to get the real stuff.
jannniii|18 days ago
Strix Halo
SillyUsername|18 days ago
esafak|18 days ago
mythz|18 days ago