top | item 47127045

(no title)

LZ_Khan | 6 days ago

If you care about improvement of models, you would support the US labs here.

It costs hundreds of millions of dollars to train a frontier model. It's not just "scraping the web."

Distillation allows labs to replicate these results at 1/100th of the cost. This creates a prisoner's dilenmma which incentivizes labs to withhold their models from the public.

discuss

order

ElevenLathe|6 days ago

How much did it cost to produce all the data on the internet and every book ever published? Surely even the most conservative calculations put it at multiple years of planetary GDP. The same argument can be made to say that letting the big labs get away with pirating it will disincentivize people to publish anything.

ceroxylon|6 days ago

I personally have stopped publishing publicly, since my research is still on the fuzzy boundary of AI's current knowledge, my website gets scraped daily, and I don't want to contribute to paid models for zero acknowledgement or compensation.

piva00|6 days ago

Not only publishing, it has already disincentivised a huge part of what made Web 2.0: public APIs for data access to platforms.

It was amazing to be able to create some toy projects using data from big platforms, now they're all afraid LLM trainers will scrape their contents and create a competitor to their moat, the data.

It just sucks at many different levels.

bigyabai|6 days ago

This reads a bit like over-moralizing to me. US labs will continue improving their models because they have to make money in a competitive market. Chinese distillations have arguably improved the status-quo, with Qwen and R1 forcing GPT-OSS to be released to the public. American businesses are competing, and American customers are getting better products because of the competitive pressure on them.

Your purported "prisoner's dilemma" hasn't happened yet to my knowledge, instead we seem to see the opposite. The high-speed development velocity has forced US labs to release more often with less nebulous results. Supporting either side will contribute to healthier competition in the long run.

contravariant|6 days ago

If 'we' really cared about the improvement of models all of them would be public.

Anything else just proves someone prefers making money to improving the models.

falcor84|6 days ago

> incentivizes labs to withhold their models from the public.

Does it really? How would they get revenue if they withhold their models? And doesn't economics generally say that if it's easier for your competitor to catch up, you have a higher incentive to maintain your lead?

falcor84|6 days ago

I think that the bigger conversation to be had here is about the environmental damage - if by using distillation we can really train new models at 1% of the cost in energy, it is ethically imperative that we do this.

wpm|6 days ago

> If you care about improvement of models, you would support the US labs here.

I guess I don't care then.

hermanzegerman|5 days ago

Tell me how they obtained that data?

Nobody feels sorry for big Multinationals trying to skirt Copyright for their own good, but then cry about it when their competition ignores it too.

You can't have your cake and eat it

YetAnotherNick|6 days ago

> incentivizes labs to withhold their models from the public.

This is the only way they make money.