top | item 45843299

(no title)

lettergram | 3 months ago

There’s a lot of indications that we’re currently brute forcing these models. There’s honestly not a reason they have to be 1T parameters and cost an insane amount to train and run on inference.

What we’re going to see is as energy becomes a problem; they’ll simply shift to more effective and efficient architectures on both physical hardware and model design. I suspect they can also simply charge more for the service, which reduces usage for senseless applications.

discuss

yanhangyhy|3 months ago

There are also elements of stock price hype and geopolitical competition involved. The major U.S. tech giants are all tied to the same bandwagon — they have to maintain this cycle: buy chips → build data centers → release new models → buy more chips.

It might only stop once the electricity problem becomes truly unsustainable. Of course, I don’t fully understand the specific situation in the U.S., but I even feel that one day they might flee the U.S. altogether and move to the Middle East to secure resources.

simpsond|3 months ago

Sundar is talking about fleeing earth to secure photons and cooling in space.

simonw|3 months ago

> There’s honestly not a reason they have to be 1T parameters and cost an insane amount to train and run on inference.

Kimi K2 Thinking is rumored to have cost $4.6m to train - according to "a source familiar with the matter": https://www.cnbc.com/2025/11/06/alibaba-backed-moonshot-rele...

I think the most interesting recent Chinese model may be MiniMax M2, which is just 200B parameters but benchmarks close to Sonnet 4, at least for coding. That's small enough to run well on ~$5,000 of hardware, as opposed to the 1T models which require vastly more expensive machines.

Der_Einzige|3 months ago

That number is as real as the 5.5 million to train DeepSeek. Maybe it's real if you're only counting the literal final training run, but total costs including the huge number of failed runs all other costs accounted for, it's several hundred million to train a model that's usually still worse than Claude, Gemini, or ChatGPT. It took 1B+ (500 billion on energy and chips ALONE) for Grok to get into the "big 4".

oxcidized|3 months ago

> That's small enough to run well on ~$5,000 of hardware...

Honestly curious where you got this number. Unless you're talking about extremely small quants. Even just a Q4 quant gguf is ~130GB. Am I missing out on a relatively cheap way to run models well that are this large?

I suppose you might be referring to a Mac Studio, but (while I don't have one to be a primary source of information) it seems like there is some argument to be made on whether they run models "well"?

electroglyph|3 months ago

i assume that $4.6 mil is just the cost of the electricity?

nl|3 months ago

Can confirm MiniMax M2 is very impressive!

MallocVoidstar|3 months ago

> What we’re going to see is as energy becomes a problem

This is much more likely to be an issue in the US than in China. https://fortune.com/2025/08/14/data-centers-china-grid-us-in...

thesmtsolver|3 months ago

Disagree. Part of the reason China produces more power (and pollution) is due to China manufacturing for the US.

https://www.brookings.edu/articles/how-do-china-and-america-...

The source for China's energy is more fragile than that of the US.

> Coal is by far China’s largest energy source, while the United States has a more balanced energy system, running on roughly one-third oil, one-third natural gas, and one-third other sources, including coal, nuclear, hydroelectricity, and other renewables.

Also, China's GDP is a bit less inefficient in terms of power used per unit of GDP. China relies on coal and imports.

> However, China uses roughly 20% more energy per unit of GDP than the United States.

Remember, China still suffers from blackouts due to manufacturing demand not matching supply. The fortune article seems like a fluff piece.

https://www.npr.org/2021/10/01/1042209223/why-covid-is-affec...

https://www.bbc.com/news/business-58733193

Leynos|3 months ago

Having larger models is nice because they have a much wider sphere of knowledge to draw on. Not in the sense of using them as encyclopedias. More in the sense that I want a model that is going to be able to cross reference from multiple domains that I might not have considered when trying to solve a problem.