Since the blackouts last year and the recent IPO, it feels like astroturfing and spam have increased, while quality contributions have decreased. All usage metrics are up according to Reddit's IPO filings, but it feels like engagement is actually down, or at least lower quality. Many niche subs feel like ghost towns now.
Is this just my subjective impression or do you feel the same?
Spam increased exponentially after the 3rd party kill switch. There was profound options for bots to add to your mod list to help combat spam, repost accounts, and more.
They existed before the whole 3rd party fallout, but it does not even come close to how bad it is now.
IME the niche subs are still doing fine, it's the popular ones that are getting astroturfed and Eternal September'ed into oblivion. I do expect the LLM bots to eventually render the platform unusable, they would need to implement a very aggressive personhood verification policy to prevent that.
Browsing /r/all is a cesspool now. It feels like the same 30 topics with random OnlyFans wannabes and seemingly outrageous relationship stories/advice. It used to be such a great way to find and explore new topics.
Niche sub-reddits largely seem to either (1) be growing so large they become bland (2) dying and moving to other platforms.
I basically only enjoy the live sporting game threads now. Even then, it's a pretty shallow level of enjoyment.
Twitter feels the same way, they are claiming to have more active users then ever but that probably includes the horde of LLM bots and MY PUSSY IN BIO spam.
Subjectively, I feel the same. That said I don't use it much any more -- when Apollo closed I stopped participating on the site.
I saw lots of comments shortly after the blackouts that 'felt' AI-generated, and when I rarely go there now from search results, using the awful new site, I see little content of value.
The incentive to contribute is based on the potential return of social currency (prestige, togetherness, etc). If it's evident that you won't generate enough currency to outweigh enriching Reddit, why bother?
The API rugpull was a real setback for content and if they had followed through with claims at the time to allow charged access that could have worked but they never rolled anything out it was just a ruse.
Maybe it is just me, but when reddit pops up in my search results (and since I am using Kagi it ranks quite often to the top) the topics are mostly useless to help me solve an issue or extract information. The threads are often outdated, and littered with personal opinions up to outright opinions and personal anectada. Compared to Q/A sites like StackExchange, the quality of information - at least for me - is very poor. Which is fine, since reddit claims to be a social network, too.
My experience has been the opposite in the last few years. I've found myself filtering results google/duckduckgo specifically for reddit, because I was finding better answers to technical questions. Anecdotal, of course, and it does seem to be getting worse (less successful for me) over the last 6 months.
50/50 - I’ve been surprised about the number of Reddit threads that have been more useful than other results. Even if it’s been a discussion that doesn’t give me a solution but helps me shape what I’m trying the find.
Although it would be nice if unanswered posts didn’t rank so highly.
Sometimes I just want the opinion of an actual human being. It's hard to find that online anymore, without affiliate links and/or $COMPANY_NAME deleting negative remarks.
Reddit didn't reach this point because they're good, but because it's the least shitty option right now. (God I wish that wasn't the case.) I'm not saying there's no astroturfing going around there - - there absolutely is -- but it's still the only "mainstream" website where I'm confident I can find some dissenting opinions about a product that are written by actual human beings.
Yeah, I had exactly this problem with Google the other day, I googled something and saw the first result was a reddit post and the short summary under the link was like "Yes I've seen this problem BUT WHAT YOU REALLY SHOULD BE DOING" and I was optimistic that someone had some good guidance. Guess what I should be doing? Uninstalling windows and running Linux... That answer somehow had made it into the Google summary despite being downvoted on actual reddit.
I don't think reddit let's you do this in anymore than a superficial way. I think reddit keeps the old edits internally so it won't harm the LLM. There were reports after the last protest of reddit reverting mass edits.
> Repost deleted/removed information. Remember that comment someone just deleted because it had personal information in it or was a picture of gore? Resist the urge to repost it. It doesn't matter what the content was. If it was deleted/removed, it should stay deleted/removed.
As I understand it, reddit as it has been has never not lost money. What, exactly, makes switching from a burn pit business model to one thst actually makes money qualify as "a bit shortsighted"? They've been doing this for two decades already. How does going from X-ten(?) billion cat photo comments to Y-ten billion open opportunities worth more than the cost of waiting yet more decades to actually make money?
It's happening to entire internet. A lot of content generated in last few months is AI, some pretty good, but not great, all kind of on 'crappy' side. The 'crappy' feedback loop into training data is going to be real problem.
Wonder if internet will migrate back to each person having their own blog that they can control.
This is the end of Web 2.0. There will be a blip on signal/noise ratio (which wasn't that great to begin with, 99.9% of UGC is trash anyway) as procedurally-generated content floods sites with even more nonsense – and then once they become unusable (reddit already is), the next crop will pop up.
I'm long on people with great taste, trendsetters and commentators, editors, and curators. They'll be the vanguard of this next iteration of the internet.
Just so we're clear, this is using reverse psychology, right? They do want you to replace your comments with copyrighted text.
I assume the wording is because of legal ramifications. I wonder if such a defense works in court.
Personally, I think doing this is pointless. LLMs already use copyrighted works, so this isn't helping at all. The only way to tank Reddit is to add meaningless text which would make LLMs worse.
I know, right. The last thing I'd expect a self-proclaimed anti-capitalist to be defending is intellectual property, especially one of a corporation (NYT).
If Reddit keeps a copy of the data edits, this move also just serves to hamper open source models who can only train on scraped data, while those with enough money can buy the full dataset with history.
What I mean is, I agree and I think this plugin will do the opposite of what the authors expect.
If every conversation about any topic has responses copying unrelated New York Times articles, what are the chances LLMs trained on that data will hallucinate even worse than before?
Slightly off topic, but since the HN site and API is open to all, it'd be silly to assume our comments aren't also part of several datasets used to train LLMs.
Are these people even sure the comment is even deleted on the backend where I assume the data will be taken from? I feel like they'll be pissing upwind and en-shit-ifying the site that will only harm users and not the data harvesting. If anything you want the public facing stuff there and free to scrape by any average Joe.
Reddit's caches are set up to only ever return the last 1,000 of anything. So for example - you can't scroll past 1k items on /new, and if you save more than 1k posts then you'll have to unsave some to retrieve the others.
If this extension only edits comments, it'll only touch the most recent 1k. You would need to retrieve the older ones with a Pushshift replacement like this: https://pullpush.io/. But that also shows how ineffective this is. We still have public reddit archives (like Pullpush and https://github.com/ArthurHeitmann/arctic_shift) which contain comments as they were originally posted. This isn't gonna be a problem for Google.
I may make a plug in for this in to my local 11b LLM so that I could have it third-party summarise my comments in a David Attenborough documentary style. I love the idea of 60k plus DA summarisations and attributions of naturalistic motivations for my comments.
I stopped using Reddit when they banned 3rd party apps, after 16 years and nearly 6000 hours on the platform, including over 2800 hours writing content on their site.
More than happy to burn it all down, for the simple fact that their app sucks so bad that it’s unusable and they banned the app that I was comfortable with.
So, I will be replacing roughly $100k of written value (at half the rate I am normally paid for my writing work) with at least that much in negative value AI generated stupidity. F@$k those guys.
I intend to be an object lesson in abusing your top performing users.
I hate the fact they allow bots and trolls to make tons of accoutns and tons of spam/troll posts daily. It would be trivial to fix this partially by putting limits on what a user can post per minute and per day, and try to make it harder to create new accounts and start spamming.
I suspect they make more money from allowing bots and trolls then doing the work of fixing this problem.
What's wrong with people? Reddit has great content. I often use Google to search it for info. Why ruin it? LLMs are also very useful and we all benefit from them.
Reddit is a company with expenses so they need to make money somehow. You didn't have to use it if you don't want your content in LLMs training data.
I think the best way to sabotage LLMs trained on Reddit data would be to post something on topic, but straightup wrong, in some other way misleading or with subtle inaccuracies that would cause LLMs to produce bad results in ways that are hard to detect.
Why do so many people, even web developers, think anyone lets you do `UPDATE` or `DELETE` in their databases?! They let you do `INSERT`. That's it. You can insert add a new edit and you can add a delete. They don't actually delete or overwrite anything.
Another flavor of this would let the user submit their comment and it'd suggest a semantically similar excerpt from "non-"copywritten text. That'd address the edit reversion dilemma.
Luddite opposition to tech was very specific, they weren't just generic technophobes or "debloated based internet minimalists", they opposed labor automating machinery that shifted power to capital. Javascript is just a web scripting language.
The reason I started leaving more comments on Reddit is precisely because it is going to be LLM training data. My wit is going to be part of our AI overlords
[+] [-] hubraumhugo|1 year ago|reply
Is this just my subjective impression or do you feel the same?
[+] [-] theyeenzbeanz|1 year ago|reply
They existed before the whole 3rd party fallout, but it does not even come close to how bad it is now.
[+] [-] herculity275|1 year ago|reply
[+] [-] nabla9|1 year ago|reply
It's more valuable to astroturf and spam in reddit than never before.
[+] [-] SkyPuncher|1 year ago|reply
Niche sub-reddits largely seem to either (1) be growing so large they become bland (2) dying and moving to other platforms.
I basically only enjoy the live sporting game threads now. Even then, it's a pretty shallow level of enjoyment.
[+] [-] jsheard|1 year ago|reply
[+] [-] FrustratedMonky|1 year ago|reply
I just read article about marketing companies using bots that post 'pretty good comments, that slightly agree with you but mention the product'.
[+] [-] vintagedave|1 year ago|reply
I saw lots of comments shortly after the blackouts that 'felt' AI-generated, and when I rarely go there now from search results, using the awful new site, I see little content of value.
[+] [-] SleepilyLimping|1 year ago|reply
[+] [-] jimmySixDOF|1 year ago|reply
[+] [-] littlecranky67|1 year ago|reply
[+] [-] cacois|1 year ago|reply
[+] [-] spongeb00b|1 year ago|reply
Although it would be nice if unanswered posts didn’t rank so highly.
[+] [-] ses1984|1 year ago|reply
[+] [-] ryukoposting|1 year ago|reply
[+] [-] Zambyte|1 year ago|reply
[+] [-] input_sh|1 year ago|reply
Reddit didn't reach this point because they're good, but because it's the least shitty option right now. (God I wish that wasn't the case.) I'm not saying there's no astroturfing going around there - - there absolutely is -- but it's still the only "mainstream" website where I'm confident I can find some dissenting opinions about a product that are written by actual human beings.
[+] [-] unknown|1 year ago|reply
[deleted]
[+] [-] addandsubtract|1 year ago|reply
[+] [-] SilverBirch|1 year ago|reply
[+] [-] z_open|1 year ago|reply
[+] [-] ziml77|1 year ago|reply
[+] [-] Havoc|1 year ago|reply
They already restore user comments against their will (and hilariously that’s also against their own reddiquette see extract below)
https://www.reddit.com/r/privacy/comments/14dcxy4/reddit_res...
> Repost deleted/removed information. Remember that comment someone just deleted because it had personal information in it or was a picture of gore? Resist the urge to repost it. It doesn't matter what the content was. If it was deleted/removed, it should stay deleted/removed.
[+] [-] mkl|1 year ago|reply
[+] [-] Phiwise_|1 year ago|reply
[+] [-] FrustratedMonky|1 year ago|reply
Wonder if internet will migrate back to each person having their own blog that they can control.
[+] [-] Havoc|1 year ago|reply
[+] [-] 23B1|1 year ago|reply
I'm long on people with great taste, trendsetters and commentators, editors, and curators. They'll be the vanguard of this next iteration of the internet.
[+] [-] xdennis|1 year ago|reply
I assume the wording is because of legal ramifications. I wonder if such a defense works in court.
Personally, I think doing this is pointless. LLMs already use copyrighted works, so this isn't helping at all. The only way to tank Reddit is to add meaningless text which would make LLMs worse.
[+] [-] CaptainFever|1 year ago|reply
If Reddit keeps a copy of the data edits, this move also just serves to hamper open source models who can only train on scraped data, while those with enough money can buy the full dataset with history.
What I mean is, I agree and I think this plugin will do the opposite of what the authors expect.
[+] [-] gorbachev|1 year ago|reply
If every conversation about any topic has responses copying unrelated New York Times articles, what are the chances LLMs trained on that data will hallucinate even worse than before?
[+] [-] helpfulContrib|1 year ago|reply
[deleted]
[+] [-] batch12|1 year ago|reply
[+] [-] floor_|1 year ago|reply
[+] [-] float-trip|1 year ago|reply
If this extension only edits comments, it'll only touch the most recent 1k. You would need to retrieve the older ones with a Pushshift replacement like this: https://pullpush.io/. But that also shows how ineffective this is. We still have public reddit archives (like Pullpush and https://github.com/ArthurHeitmann/arctic_shift) which contain comments as they were originally posted. This isn't gonna be a problem for Google.
[+] [-] K0balt|1 year ago|reply
I stopped using Reddit when they banned 3rd party apps, after 16 years and nearly 6000 hours on the platform, including over 2800 hours writing content on their site.
More than happy to burn it all down, for the simple fact that their app sucks so bad that it’s unusable and they banned the app that I was comfortable with.
So, I will be replacing roughly $100k of written value (at half the rate I am normally paid for my writing work) with at least that much in negative value AI generated stupidity. F@$k those guys.
I intend to be an object lesson in abusing your top performing users.
[+] [-] GaggiX|1 year ago|reply
[+] [-] WithinReason|1 year ago|reply
[+] [-] simion314|1 year ago|reply
I suspect they make more money from allowing bots and trolls then doing the work of fixing this problem.
[+] [-] donatj|1 year ago|reply
[+] [-] tinyhouse|1 year ago|reply
Reddit is a company with expenses so they need to make money somehow. You didn't have to use it if you don't want your content in LLMs training data.
[+] [-] batch12|1 year ago|reply
[+] [-] hoseja|1 year ago|reply
[+] [-] gorbachev|1 year ago|reply
Use proven information warfare tactics.
[+] [-] globular-toast|1 year ago|reply
[+] [-] batch12|1 year ago|reply
[+] [-] upget_tiding|1 year ago|reply
It seems somewhat ironic that a website called the luddite would require me to enable javascript on their site in order to read it.
[+] [-] Barrin92|1 year ago|reply
[+] [-] arbol|1 year ago|reply
[+] [-] anArbitraryOne|1 year ago|reply