Self-hosted might be the way to go soon. I'm getting 2x Olares One boxes, each with an RTX 5090 GPU (NVIDIA 24GB VRAM), and a built-in ecosystem of AI apps, many of which should be useful, and Kubernetes + Docker will let me deploy whatever else I want. Presumably I will manage to host a good coding model and use Claude Code as the framework (or some other). There will be many good options out there soon.
behnamoh|1 month ago
As someone with 2x RTX Pro 6000 and a 512GB M3 Ultra, I have yet to find these machines usable for "agentic" tasks. Sure, they can be great chat bots, but agentic work involves huge context sent to the system. That already rules out the Mac Studio because it lacks tensor cores and it's painfully slow to process even relatively large CLAUDE.md files, let alone a big project.
The RTX setup is much faster but can only support models ≤192GB, which severely limits its capabilities as you're limited to low Q GLM 4.7, GLM 4.7 Flash/Air/ GPT OSS 120b, etc.
NitpickLawyer|1 month ago
The best you can get today with consumer hardware is something like devstral2-small(24B) or qwen-coder30b(underwhelming) or glm-4.7-flash (promising but buggy atm). And you'll still need beefy workstations ~5-10k.
If you want open-SotA you have to get hardware worth 80-100k to run the big boys (dsv3.2, glm4.7, minimax2.1, devstral2-123b, etc). It's ok for small office setups, but out of range for most local deployments (esp considering that the workstations need lots of power if you go 8x GPUs, even with something like 8x 6000pro @ 300w).
asno3030|1 month ago
Yes I like my crack at 20, bucks a month, but the tech is going to need to improve quick to keep that up.
zen4ttitude|1 month ago