(no title)
lompad | 1 month ago
Please prove this statement, so far there is no indication that this is actually true - the opposite seems to be the case. Here are some actual numbers [0] (and whether you like Ed or not, his sources have so far always been extremely reliable.)
There is a reason the AI companies don't ever talk about their inference costs. They boast with everything they can find, but inference... not.
patresh|1 month ago
Those are not contradictory: a company's inference costs can increase due to deploying more models (Sora), deploying larger models, doing more reasoning, and an increase in demand.
However, if we look purely at how much it costs to run inference on a fixed amount of requests for a fixed model quality, I am quite convinced that the inference costs are decreasing dramatically. Here's a model from late 2025 (see Model performance section) [1] with benchmarks comparing a 72B parameter model (Qwen2.5) from early 2025 to the late 2025 8B Qwen3 model.
The 9x smaller model outperforms the larger one from earlier the same year on 27 of the 40 benchmarks they were evaluated on, which is just astounding.
[1] https://huggingface.co/Qwen/Qwen3-VL-8B-Instruct
academia_hack|1 month ago
Anecdotally, I find you can tell if someone worked at a big AI provider or a small AI startup by proposing an AI project like this:
" First we'll train a custom trillion parameter LLM for HTML generation. Then we'll use it to render our homepage to our 10 million daily visitors. "
The startup people will be like "this is a bad idea because you don't have enough GPUs for training that LLM" and the AI lab folks will be like "How do you intend to scale inference if you're not Google?"
changbai|1 month ago
https://a16z.com/llmflation-llm-inference-cost/ for example shows this to be true.
The report from OpenRouter https://openrouter.ai/state-of-ai also makes the same observation.