(no title)
tetraodonpuffer | 6 months ago
As an aside, I am not sure why for LLM models the technology to spread among multiple cards is quite mature, while for image models, despite also using GGUFs, this has not been the case. Maybe as image models become bigger there will be more of a push to implement it.
reissbaker|6 months ago
Also for a 20B model, you only really need 20GB of VRAM: FP8 is near-identical to FP16, it's only below FP8 that you start to see dramatic drop-offs in quality. So literally any Mac Studio available for purchase will do, and even a fairly low-end Macbook Pro would work as well. And a 5090 should be able to handle it with room to spare as well.
dur-randir|6 months ago
BoredPositron|6 months ago
RossBencina|6 months ago
slickytail|6 months ago
cma|6 months ago
AuryGlenz|6 months ago
Training it will also be out of reach for most. I’m sure I’ll be able to handle it on my own 5090 at some point but it’ll be slow going.
TacticalCoder|6 months ago
40 GB of VRAM? So two GPU with 24 GB each? That's pretty reasonable compared to the kind of machine to run the latest Qwen coder (which btw are close to SOTA: they do also beat proprietary models on several benchmarks).
cellis|6 months ago
AuryGlenz|6 months ago