Except part of their 'value prop' is "We have this giant trove of human created content, and AI companies need to start paying us to utilize it when training their models".
There are public data dumps of Reddit comments available all the way up to December 2022. And they're only roughly ~2TB all together.
There's nothing stopping AI companies from just using those instead of paying Reddit $50 million to scrape all of them using the API. It would also be 10x-100x quicker to do that rather than hammer their API for the comments (the API sucks for mass data retrieval)
Sure, but companies doing that also wouldn’t be paying Reddit for that data.
The point of shredding comments isn’t to hurt the companies scraping the data (although that might be a nice side effect). Ultimately it’s to hurt Reddit.
Ah, good point. If this is the case, yeah, shred away. Still it's too bad that this greed will make it harder for humans to see useful old discussions.
Do you think that reddit actually deletes comments when the user presses delete? My assumption would be that it just sticks up a "do not display" flag in the database. I'm sure that there's some influence that GDPR has though.
The plug in I use (I think nuke Reddit) overwrites comments with random blarg that’s realistic sounding text, then deletes them.
I’m sure Reddit keeps all versions as well. But I think it would be impractical to restore to the correct version at scale unless they want to manually review to find the “right” version to restore.
I think if they got a specific subpoena for me, they could find my comments with a manual investigation, but I expect that will never happen as there’s no reason for anyone to do that.
I just want to remove my content from Reddit.com and make it harder if they decide to undelete or otherwise not respect my decision.
I’m surprised Reddit still allows edit and undelete and expect them to remove the functionality soon.
This is probably true, but at least one implication of what the program in the title does is edit your existing comments with something before marking them as deleted, because at some point (this is probably no longer true) Reddit did not store your entire comment history.
There are obviously ways to defeat this in analysis, but it does make Reddit's job slightly harder if they want to leverage that data. It would also probably be interesting to also just edit them and not delete them in some cases in some randomized way, which would make it even harder to reliably tease out good comments from noise.
Some time last year I attempted to make a similar tool. I was able to retrieve comments that had been deleted in the requests so I suspect that there is a "display flag" of sorts that is checked against.
GDPR only applies to EU citizens though. If the data is truly valuable, I could imagine some work-arounds as well. E.g. maybe each reddit post is automatically a copyright work which you immediate give a perpetual license to reddit inc. You also automatically transfer copyright ownership to reddit inc and they license back your ability to share your comment.
commandlinefan|2 years ago
mynameisvlad|2 years ago
That could have changed, sure, but nothing indicates that is the case.
You could also go the GDPR route and request all your data be deleted, if you are subject to that. They would be forced to comply with that request.
yawnr|2 years ago
AlecSchueler|2 years ago
unknown|2 years ago
[deleted]
antisthenes|2 years ago
There's nothing stopping AI companies from just using those instead of paying Reddit $50 million to scrape all of them using the API. It would also be 10x-100x quicker to do that rather than hammer their API for the comments (the API sucks for mass data retrieval)
mynameisvlad|2 years ago
The point of shredding comments isn’t to hurt the companies scraping the data (although that might be a nice side effect). Ultimately it’s to hurt Reddit.
realce|2 years ago
monitron|2 years ago
coldpie|2 years ago
SilverBirch|2 years ago
prepend|2 years ago
I’m sure Reddit keeps all versions as well. But I think it would be impractical to restore to the correct version at scale unless they want to manually review to find the “right” version to restore.
I think if they got a specific subpoena for me, they could find my comments with a manual investigation, but I expect that will never happen as there’s no reason for anyone to do that.
I just want to remove my content from Reddit.com and make it harder if they decide to undelete or otherwise not respect my decision.
I’m surprised Reddit still allows edit and undelete and expect them to remove the functionality soon.
SOLAR_FIELDS|2 years ago
There are obviously ways to defeat this in analysis, but it does make Reddit's job slightly harder if they want to leverage that data. It would also probably be interesting to also just edit them and not delete them in some cases in some randomized way, which would make it even harder to reliably tease out good comments from noise.
fmdragon|2 years ago
unknown|2 years ago
[deleted]
unsupp0rted|2 years ago
dimitrios1|2 years ago
ed312|2 years ago
dontupvoteme|2 years ago
amne|2 years ago
willcipriano|2 years ago
unknown|2 years ago
[deleted]