top | item 47126781

(no title)

impulser_ | 6 days ago

Why would anyone care about this at all?

MiniMax, DeepSeek, and Moonshot are all releasing models for the public to use for free.

Anthropic, OpenAI, Google ect have been scraping information to train their models that they had no right in scraping yet when these company pay them to scrap data we are suppose to be worried?

Labs like Anthropic always preach we are trying to build AI for everyone while releasing expensive models that are closed source.

The only reason AI is affordable at all is because of these Chinese AI labs.

discuss

order

lumost|6 days ago

Also - how can this be prevented? the AI labs can't seriously expect that each lab will filter LLM generated content from their training sets based on the source model. Leakage of AI behavior into public datasets is inevitable.

reactordev|6 days ago

Turn the lens the other way around. By publicly posting that these models violate IP and anyone can run them, they are painting a specific political picture here…

NitpickLawyer|6 days ago

> Why would anyone care about this at all?

Anthropic have been the loudest in pushing for regulatory capture, often citing "muh security" as FUD. People should care what they write on this topic, because they're not writing for us, they're writing for "the regulators". Member when the usgov placed a dude in solitary confinement because they thought he could launch nukes with a whistle? Yeah... Let's hope they don't do some cray cray stuff with open LLMs.

Anthropic make amazing coding models, kudos for that. But they should be mocked for any communication like the one linked. Boo-hoo. Deal with it, or don't, I don't care. No one will feel for you. What goes around, comes around. Etc.

bigyabai|6 days ago

Administratively, Anthropic seems to misunderstand politics. You don't get to wear the "people's champion" and "government sweetheart" hats at the same time, when push comes to shove you'll be forced to pick a lane. We saw it with Microsoft, we saw it with Apple and Google, and now we're seeing it with OpenAI too. You can't drive down both paths at the same time.

As a member of the target audience for Claude, their messaging just leaves me confused. Are you a renegade success, or do you need the government's help? Are you a populist juggernaut, or do you hide from competition? OpenAI, for all their myriad issues, understood this from the start and stuck to the blithely profitable federal ass-kisser route.

nashadelic|5 days ago

Why? Imagine the frontier labs lobbying for a ban on using "chinese ai" for commercial use in america

PlatoIsADisease|6 days ago

Go free stuff! But... no one is running 400B models on their computers.

You are just giving them data instead. Its not like China is known to protect IP. Your data is going to be used against you, and we cant use western laws to keep it safe.

SlavikCA|6 days ago

So, only Americans can use data against others?

By the way, I'm running 400B model on my computer with 72GB VRAM: Qwen3.5-397B-A17B-GGUF/UD-Q4_K_XL getting 13 t/s. Subjectively, I feel it's runs at the level of Anthropic Claude, just slower.

bigyabai|6 days ago

> we cant use western laws to keep it safe

Western laws didn't stop OpenAI from leaking PII, or Nest from getting hacked. I'll take my chances with the CCP.

selfhoster11|6 days ago

It doesn't take much hardware. I have run larger models.

LZ_Khan|6 days ago

If you care about improvement of models, you would support the US labs here.

It costs hundreds of millions of dollars to train a frontier model. It's not just "scraping the web."

Distillation allows labs to replicate these results at 1/100th of the cost. This creates a prisoner's dilenmma which incentivizes labs to withhold their models from the public.

ElevenLathe|6 days ago

How much did it cost to produce all the data on the internet and every book ever published? Surely even the most conservative calculations put it at multiple years of planetary GDP. The same argument can be made to say that letting the big labs get away with pirating it will disincentivize people to publish anything.

bigyabai|6 days ago

This reads a bit like over-moralizing to me. US labs will continue improving their models because they have to make money in a competitive market. Chinese distillations have arguably improved the status-quo, with Qwen and R1 forcing GPT-OSS to be released to the public. American businesses are competing, and American customers are getting better products because of the competitive pressure on them.

Your purported "prisoner's dilemma" hasn't happened yet to my knowledge, instead we seem to see the opposite. The high-speed development velocity has forced US labs to release more often with less nebulous results. Supporting either side will contribute to healthier competition in the long run.

contravariant|6 days ago

If 'we' really cared about the improvement of models all of them would be public.

Anything else just proves someone prefers making money to improving the models.

falcor84|6 days ago

> incentivizes labs to withhold their models from the public.

Does it really? How would they get revenue if they withhold their models? And doesn't economics generally say that if it's easier for your competitor to catch up, you have a higher incentive to maintain your lead?

falcor84|6 days ago

I think that the bigger conversation to be had here is about the environmental damage - if by using distillation we can really train new models at 1% of the cost in energy, it is ethically imperative that we do this.

wpm|6 days ago

> If you care about improvement of models, you would support the US labs here.

I guess I don't care then.

hermanzegerman|5 days ago

Tell me how they obtained that data?

Nobody feels sorry for big Multinationals trying to skirt Copyright for their own good, but then cry about it when their competition ignores it too.

You can't have your cake and eat it

YetAnotherNick|6 days ago

> incentivizes labs to withhold their models from the public.

This is the only way they make money.