Isn't the inference cost of running these models at scale challenging? Currently it feels like small LLMs (1B-4B) are able to perform well for simpler agentic workfows. There are definitely some constraints but surely much easier than to pay for big clusters on cloud running for these tasks. I believe it distributes the cost more uniformly
bigyabai|9 months ago
vkkhare|9 months ago