2xEPYC Genoa w/768GB of DDR5-4800 and an A5000 24GB card.
I built it in January 2024 for about $6k and have thoroughly enjoyed running every new model as it gets released. Some of the best money I’ve ever spent.
I've seen some mentions of pure-cpu setups being successful for large models using old epyc/xeon workstations off ebay with 40+ cpus. Interesting approach!
testaburger|6 months ago
smartbit|6 months ago
fouc|6 months ago
wkat4242|6 months ago
How many tokens/s do you get for DeepSeek-R1?
DrPhish|6 months ago
R1 starts at about 10t/s on an empty context but quickly falls off. I'd say the majority of my tokens are generating around 6t/s.
Some of the other big MoE models can be quite a bit faster.
I'm mostly using QwenCoder 480b at Q8 these days for 9t/s average. I've found I get better real-world results out of it than K2, R1 or GLM4.5.
ekianjo|6 months ago