top | item 36579122

(no title)

abramN | 2 years ago

>> Large models are trained on public data scraped via API. Content-heavy sites are most likely to be disrupted (why post on StackOverflow?) by models trained on their own data. Naturally, they want to restrict access and either (1) sell the data or (2) train their own models. This restriction prevents (or complicates) Google’ automatic scraping of the data for Search (and probably for training models, too).

This is going to be the interesting part to me - one HUGE usage scenario for AI would be doing automated searches and distilling results from the web - if sites start preventing that then AI innovation could be stifled - OR we'll see a scenario where the average person is very limited with that their AI subscription can access. We could have a situation where e.g. Fandango prevents AI from searching its site to help people plan their movie outing, but instead has their own model that they'll charge for access. We could have models talking to each other, deals made between the owners of the models for access, and the average citizen uses a search engine optimized for monetization, that may have data that's months out of date but they can pay extra for the model that provides current data.

discuss

No comments yet.