As a heavy user of OpenAI, Anthropic, and Google AI APIs, I’m increasingly tempted to buy a Mac Studio (M3 Ultra or M4 Pro) as a contingency in case the economics of hosted inference change significantly.
Don't buy anything physical, benchmark the models you could run on your potential hardware on (neo) cloud provider like HuggingFace. Only if you believe the quality is up to your expectation then do it. The test itself should take you $100 and few hours top.
the thing is GLM 4.7 is easily doing the work Opus was doing for me but to run it fully you'll need a much bigger hardware than a Mac Studio. $10k buys you a lot of API calls from z.ai or Anthropic. It's just not economically viable to run a good model at home.
You can cluster Mac Studios using Thunderbolt connections and enable RDMA for distributed inference. This will be slower than a single node but is still the best bang-for-the-buck wrt. doing inference on very-large-sized models.
True — I think local inference is still far more expensive for my use case due to batching effects and my relatively sporadic, hourly usage. That said, I also didn’t expect hardware prices (RTX 5090, RAM) to rise this quickly.
FWIW the M5 appears to be an actual large leap for LLM inference with the new GPU and Neural Accelerator. So id wait for the Pro/Max before jumping on M3 Ultra.
You'd want to get something like a RTX Pro 6000 (~ $8,500 - $10,000) or at least a RTX 5090 (~$3,000). That's the easiest thing to do or cluster of some lower-end GPUs. Or a DGX Spark (there are some better options by other manufacturers than just Nvidia) (~$3000).
Yes, I also considered the RTX 6000 Pro Max-Q, but it’s quite expensive and probably only makes sense if I can use it for other workloads as well. Interestingly, its price hasn’t gone up since last summer, here in Germany.
M3 Ultra with DGX Spark is right now what M5 Ultra will be in who knows when. You can just buy those two, connect them together using Exo and have M5 Ultra performance/memory right away. Who knows what M5 Ultra will cost given RAM/SSD price explosion?
I have researched a bit more and think your recommendations are spot on. The 256 GB M3 Ultra is probably the best value right now even though it's 2k EUR more expensive than the 96 GB version.
yes, I'm using smaller models on a Mac M2 Ultra 32GB and they work well, but larger models and coding use might be not a good fit for the architecture, after all.
utopiah|1 month ago
mitjam|1 month ago
mohsen1|1 month ago
zozbot234|1 month ago
mitjam|1 month ago
pram|1 month ago
mitjam|1 month ago
boredatoms|1 month ago
wmf|1 month ago
mifreewil|1 month ago
mitjam|1 month ago
storus|1 month ago
mitjam|1 month ago
PlatoIsADisease|1 month ago
Just look at what people are actually using. Don't rely on a few people who tested a few short prompts with short completions.
mitjam|1 month ago