top | item 44760919

(no title)

meaydinli | 7 months ago

Take a look at: https://www.nvidia.com/en-us/products/workstations/dgx-spark... . IIRC, it was about ~$4K.

discuss

Given that for a non quantized 700B monolithic model with let's say a 1M token context, you would need around 20TB of memory, I doubt your spark or M4 will get very far.

I'm not saying those machines can't be usefull or fun, but it's not in the range of the 'fantasy' thing you're responding to.

daft_pink|7 months ago

I regularly use Gemini CLI and Claude Code, and I'm convinced that Gemini's enormous context window isn't that helpful in many situations. I think the more you put into context, the more likely the model is to go off into on a tangent and you end up with "context rot" or get confused and start working on an older no longer relevant context. You definitely need to manage and clear your context window and the only time I would want such a large context window is when the source data is really that large.

phonon|7 months ago

An M4 Max twice the memory bandwidth (which is typically the limiting factor)

BoorishBears|7 months ago

I'll say neither of them will do anything for you if you're currently using SOTA closed models in anger and expect that performance to hold.

I'm on a 128GB M4 Max, and running models locally is a curiosity at best given the relative performance.