top | item 46630164

(no title)

sognetic | 1 month ago

Everything is currently pointing towards inference being the main cost driver for LLMs in the future. Test-time-compute requires huge amounts of tokens in inference and makes providing frontier models as services unprofitable.

Anyone not under some kind of export restrictions can scrounge together some GPUs to train a frontier model (hell, even DeepSeek which is under these restrictions could) but providing a service that can compete with OpenAI et al. will prove to be quite costly. 3x improvements in inference are therefore nothing to sneeze at IMO.

discuss

No comments yet.