(no title)
LZ_Khan | 6 days ago
It costs hundreds of millions of dollars to train a frontier model. It's not just "scraping the web."
Distillation allows labs to replicate these results at 1/100th of the cost. This creates a prisoner's dilenmma which incentivizes labs to withhold their models from the public.
ElevenLathe|6 days ago
ceroxylon|6 days ago
piva00|6 days ago
It was amazing to be able to create some toy projects using data from big platforms, now they're all afraid LLM trainers will scrape their contents and create a competitor to their moat, the data.
It just sucks at many different levels.
bigyabai|6 days ago
Your purported "prisoner's dilemma" hasn't happened yet to my knowledge, instead we seem to see the opposite. The high-speed development velocity has forced US labs to release more often with less nebulous results. Supporting either side will contribute to healthier competition in the long run.
contravariant|6 days ago
Anything else just proves someone prefers making money to improving the models.
falcor84|6 days ago
Does it really? How would they get revenue if they withhold their models? And doesn't economics generally say that if it's easier for your competitor to catch up, you have a higher incentive to maintain your lead?
falcor84|6 days ago
wpm|6 days ago
I guess I don't care then.
LZ_Khan|6 days ago
hermanzegerman|5 days ago
Nobody feels sorry for big Multinationals trying to skirt Copyright for their own good, but then cry about it when their competition ignores it too.
You can't have your cake and eat it
YetAnotherNick|6 days ago
This is the only way they make money.