top | item 34892894

(no title)

tantalic | 3 years ago

At what point do we ask if training from “datasets crawled from the internet” is itself the greater poison?

discuss

The internet is the representation of the human "meta-mind".

Organisations are seen as a slow form of AI. Their decision making is different to what each individual would make so it represents a different form of "mind".

All humanity (to some definition of all) is also a "mind" - its currently trying to decide on problems like "climate chnage"

The workings of that mind, a brain scan if you like, is the internet. It's a map of the state of each neuron (my twitter history?) and the interconnections between those are how the brain thinks. And we can see into the workings of that mind, and indeed alter it.

AI trained from that "brain scan" is simply an model of human meta mind we can play with faster.

Any problems with ChatGPT are therefore problems with humanity.

Maybe

8note|3 years ago

It's a representation, not the representation.

By looking at the internet, especially web 2 content, you're getting what the engagement algorithms have decided is good for advertisers.

There's plenty of stuff that humanity does that the internet does not incentivize and thus has no representation for

imranq|3 years ago

The internet has about 67% of the world's users, that leaves about 2-3bn not represented. And among those, only about 0.001% actually post and contribute content that is available on the open web, and I'm willing to bet that population of contributors does not represent the world demographic

soulofmischief|3 years ago

Remember, the map is not the territory. And we have many types of maps for many kinds of specialized purposes.

https://en.wikipedia.org/wiki/Map%E2%80%93territory_relation

RobotToaster|3 years ago

The other day I read that models like stable diffusion can be windows into the human collective subconscious. Not sure if I agree but it's an interesting theory.

isodev|3 years ago

I wonder the same. Also scraping for training data feels like something that should be opt in. I really have a problem with the stance that just because a piece of data is technically accessible, that it’s fair game. It also undermines the lineage and trustworthiness of the final model e.g. how does one verify that a model’s predictive outcomes are in line with expectations.

Timwi|3 years ago

Conversely, an opt-in dataset would surely consist of 99.99% spam.

pjc50|3 years ago

Legally, it seems to me that this is as poisonous as "take a shot of everything in your kitchen cupboard and mix it up". It relies on handwaving away both copyright and GDPR concerns.