(no title)
biddit | 2 months ago
Yes, as someone who spent several thousand $ on a multi-GPU setup, the only reason to run local codegen inference right now is privacy or deep integration with the model itself.
It’s decidedly more cost efficient to use frontier model APIs. Frontier models trained to work with their tightly-coupled harnesses are worlds ahead of quantized models with generic harnesses.
theLiminator|2 months ago
cmrdporcupine|2 months ago
Esp with RAM prices now spiking.