Also there's the ethics of scraping the whole internet and claiming that it's all fair use, because the other scenario is a little too inconvenient for all the companies involved.
P.S.: I expect a small thread telling me that it's indeed fair use, because models "learn and understand just like humans", and "models are hugely transformative" (even though some licenses say "no derivatives whatsoever"), "they are doing something amazing so they need no permission", and I'm just being naive.
I'm a radicalized intellectual property abolitionist. The ethical issue with scraping is the DDoS-like nature it has on smaller sites and running up the bandwith bill for medium hosts. There's no individual compnay at fault for the flood. Rather, it's an emergent result of each startup attempting to train data that's ever so slightly more up-to-date or broad than its competitors. If they shared a common corpus that updated once per month, scraping traffic would be buried in organic human visitors instead of the other way around. Let them compete on training methodology, not a race for scraping.
Worrying about that stuff is just a waste of time. Not because of what you said, but because it's all ultimately pointless.
Unless you believe this will kill AI, all it does is to create a bunch of data brokers.
Once fees are paid, data is exchanged, and models are trained, if the AI takes your job of programming/drawing/music, then it still does. We arrived at the same destination, only with more lawyers in the mix. You get to enjoy unemployment only knowing that lawyers made sure that at least they didn't touch your cat photos.
Yes, all this highly public hand-wringing about "alignment" framed in terms of "but if our AI becomes God, will it be nice to us" is annoying. It feels like it's mostly a combination of things. Firstly, by play-acting that your model could become God, you install FOMO in investors who see themselves not being on the hyper-lucrative "we literally own God as ascend to become its archangels" boat. You look like you're taking ethics seriously and that deflects regulatory and media interest. And, it's a bit of fun sci-fi self-pleasure for the true believers.
What the deflection is away from is that the actual business plan here is the same one tech has been doing for a decade: welding every flow and store of data in the world to their pipelines, mining every scrap of information that passes through and giving themselves the ability to shape the global information landscape, and then sell that ability to the highest bidders.
The difference with "AI" is that they finally have a way to convince people to hand over all the data.
It's interesting how I think our experience differs completely, for example, regarding people's concerns for AI ethics you write:
>People are far more concerned with the real-world implications of ethics: governance structures, accountability, how their data is used, jobs being lost, etc. In other words, they’re not so worried about whether their models will swear or philosophically handle the trolley problem so much as, you know, reality. What happens with the humans running the models? Their influx of power and resources? How will they hurt or harm society?
This is just not my experience at all. People do worry about how models act because they infer that eventually they will be used as source of truth and because they already get used as source of action. People worry about racial makeup in certain historical contexts[1], people worry when Grok starts spouting Nazi stuff (hopefuly I don't need a citation for that one) because they take it as a sign of bias in a system with real world impact, that if ChatGPT happens to doubt the holocaust tomorrow, when little Jimmy asks it for help in an essay he will find a whole lot of white supremacist propaganda. I don't think any of this is fictional.
I find the same issue with the privacy section. Yes concerns about privacy are primarily about sharing that data, precisely because controlling how that data is shared is a first, necessary step towards being able to control what is done with the data. In a world in which my data is taken and shared freely I don't have any control on what is done with that data because I have no control on who has it in the first place.
Thanks for the perspective. For me I think it's a matter of degree (I guess I was a bit "one or the other" when I wrote it).
These things are also concerns and definitely shouldn't be dismissed entirely (especially things like AI telling you when it's unsure, or, the worse cases of propaganda), but I'm worried about the other stuff I mention being defined away entirely, the same way I think it has been with privacy. Tons more to say on the difference between "how you use" vs "how you share" but good perspective, and interesting that you see the emphasis differently in your experiences.
bayindirh|6 months ago
Also there's the ethics of scraping the whole internet and claiming that it's all fair use, because the other scenario is a little too inconvenient for all the companies involved.
P.S.: I expect a small thread telling me that it's indeed fair use, because models "learn and understand just like humans", and "models are hugely transformative" (even though some licenses say "no derivatives whatsoever"), "they are doing something amazing so they need no permission", and I'm just being naive.
BeFlatXIII|6 months ago
dale_glass|6 months ago
Unless you believe this will kill AI, all it does is to create a bunch of data brokers.
Once fees are paid, data is exchanged, and models are trained, if the AI takes your job of programming/drawing/music, then it still does. We arrived at the same destination, only with more lawyers in the mix. You get to enjoy unemployment only knowing that lawyers made sure that at least they didn't touch your cat photos.
jeppester|6 months ago
It all depends on what is most convenient for avoiding any accountability.
JackFr|6 months ago
As such fair use is whatever the courts say it is.
i_dont_know_|6 months ago
grues-dinner|6 months ago
What the deflection is away from is that the actual business plan here is the same one tech has been doing for a decade: welding every flow and store of data in the world to their pipelines, mining every scrap of information that passes through and giving themselves the ability to shape the global information landscape, and then sell that ability to the highest bidders.
The difference with "AI" is that they finally have a way to convince people to hand over all the data.
Levitz|6 months ago
>People are far more concerned with the real-world implications of ethics: governance structures, accountability, how their data is used, jobs being lost, etc. In other words, they’re not so worried about whether their models will swear or philosophically handle the trolley problem so much as, you know, reality. What happens with the humans running the models? Their influx of power and resources? How will they hurt or harm society?
This is just not my experience at all. People do worry about how models act because they infer that eventually they will be used as source of truth and because they already get used as source of action. People worry about racial makeup in certain historical contexts[1], people worry when Grok starts spouting Nazi stuff (hopefuly I don't need a citation for that one) because they take it as a sign of bias in a system with real world impact, that if ChatGPT happens to doubt the holocaust tomorrow, when little Jimmy asks it for help in an essay he will find a whole lot of white supremacist propaganda. I don't think any of this is fictional.
I find the same issue with the privacy section. Yes concerns about privacy are primarily about sharing that data, precisely because controlling how that data is shared is a first, necessary step towards being able to control what is done with the data. In a world in which my data is taken and shared freely I don't have any control on what is done with that data because I have no control on who has it in the first place.
[1] https://www.theguardian.com/technology/2024/mar/08/we-defini...
i_dont_know_|6 months ago
These things are also concerns and definitely shouldn't be dismissed entirely (especially things like AI telling you when it's unsure, or, the worse cases of propaganda), but I'm worried about the other stuff I mention being defined away entirely, the same way I think it has been with privacy. Tons more to say on the difference between "how you use" vs "how you share" but good perspective, and interesting that you see the emphasis differently in your experiences.