top | item 46333747

(no title)

1gn15 | 2 months ago

This article commits several common and disappointing fallacies:

1. Open weight models exist, guys.

2. It assumes that copyright is stripped when doing essentially Img2Img on code. That's not true. (Also, copyright != attribution.)

3. It assumes that AI is "just rearranging code". That's not true. Speaking about provenance in learning is as nonsensical as asking one to credit the creators of the English alphabet. There's a reason why literally every single copyright-based lawsuit against machine learning has failed so far, around the world.

4. It assumes that the reduction in posts on StackOverflow is due to people no longer wanting to contribute. That's likely not true. Its just that most questions were "homework questions" that didn't really warrant a volunteer's time.

discuss

bicepjai|2 months ago

I love the LLM tech and use them everyday for coding. I don’t like calling them AI. We can definitely argue LLMs are not just rearranging code. But let’s look at some evidence that shows otherwise. Last year NYT lawsuit that show llms has memorized most of the news text, you had see those examples. Recent not-yet peer reviewed academic paper “Language Models are Injective and Hence Invertible “ shows llms just memorized training data. Also this https://youtu.be/O7BI4jfEFwA?si=rjAi5KStXfURl65q recent defcon33 talk shows so much ways you can get training data out. Given all these, it’s hard to believe they are intelligently generating code.

p0w3n3d|2 months ago

Reg. 3 AI is a lossy compression of text indeed. I recommend youtubing "karpathy deep dive LLM" (/7xTGNNLPyMI) - he shows that the open texts used in the training are regurgitated unchanged when speaking to the raw model. It means that if you say to the model "oh say can you" it will answer "see by the dawn's early light" or something similar like "by the morning's sun" or whatever. So very lossy but compression, which would be something else without the given text that was used in the training