top | item 46339074

(no title)

tcdent | 2 months ago

We're not yet to the point where a single PCIe device will get you anything meaningful; IMO 128 GB of ram available to the GPU is essential.

So while you don't need a ton of compute on the CPU you do need the ability address multiple PCIe lanes. A relatively low-spec AMD EPYC processor is fine if the motherboard exposes enough lanes.

discuss

order

skhameneh|2 months ago

There is plenty that can run within 32/64/96gb VRAM. IMO models like Phi-4 are underrated for many simple tasks. Some quantized Gemma 3 are quite good as well.

There are larger/better models as well, but those tend to really push the limits of 96gb.

FWIW when you start pushing into 128gb+, the ~500gb models really start to become attractive because at that point you’re probably wanting just a bit more out of everything.

tcdent|2 months ago

IDK all of my personal and professional projects involve pushing the SOTA to the absolute limit. Using anything other than the latest OpenAI or Anthropic model is out of the question.

Smaller open source models are a bit like 3d printing in the early days; fun to experiment with but really not that valuable for anything other than making toys.

Text summarization, maybe? But even then I want a model that understands the complete context and does a good job. Even things like "generate one sentence about the action we're performing" I usually find I can just incorporate it into the output schema of a larger request instead of making a separate request to a smaller model.

p1necone|2 months ago

I'm holding out for someone to ship a gpu with dimm slots on it.

tymscar|2 months ago

DDR5 is a couple of orders of magnitude slower than really good vram. That’s one big reason.

anon25783|2 months ago

Would that be worth anything, though? What about the overhead of clock cycles needed for loading from and storing to RAM? Might not amount to a net benefit for performance, and it could also potentially complicate heat management I bet.

kristianp|2 months ago

A single CAMM might suit better.