(no title)
namanski | 1 year ago
You'll need GPUs for inferencing + have to quantize the model + have it hosted on the cloud. The platform I've built is around the same workflow (but all of it is automated, along with autoscaling, and you get an API endpoint; you only pay for the compute you host on).
Generally, the GPU(s) you choose will depend on how big the model is + how many tokens/sec you're looking to get out of it.
No comments yet.