To play the devil's advocate. If you were running a large public forum, and you knew that many companies had started to scrape all data off your site, and were going to cumulatively make billions off that data, and some of those billions will come from polluting your forum with crap content, would you continue running your site in the open?
What is the game theory here? Twitter cooperates and OpenAI defects, and we call that a win?
Alternatively, build a private invite only dataset of specific communities. Scale horizontally instead of having one single central dataset! That's still a huge win for user ownership.
It doesn't have to be one company controlling who gets access, the users can decide this
Practically speaking, 'some' of the money is the same or just as good as 'all' of the money in terms of a functioning economy and not pure capitalism.
People pirate software, yet people still develop and sell it.
Yes, it's not the most profitable thing done alone. Only for those who find the nice combination or feedback loop of 'fit', demand, improvement, and expansion.
If you can make billions off the thing, you can presumably handle some GeoIP/rate limiting... or simply, not caring. Anything that falls through the cracks is categorically insignificant to your grand nature.
To justify it, if one must, consider it a trial. As your friendly neighborhood dealer would say: "The first taste is free".
It's tragic that the LLM craze OpenAI kicked off is threatening to ruin one of the greatest common goods ever invented in the open internet. But hey, at least a handful of giant corporations and investors are making money, so I guess that counts as a win.
Here's a thought: someone "trustworthy" should maintain a Chrome extension or Tapermonkey script that automatically scrapes data from various social media sites in a fully anonymized fashion. As people browse Twitter, Reddit, or XYZ, the posts/comments are sent to some aggregation system. It might be against TOS, but certainly far less than scraping, and you couldn't tell, as it's the user driving what gets scraped.
I don't use Twitter often, but I'd run something like that if there were strong anonymity guarantees. Seems like a win-win for everyone.
To me, Twitter is by far the most toxic social network. It has the power to turn the most interesting and smartest people into bitter trolls. Maybe some people get value from it, but it requires serious self discipline to not get dragged into stupid arguments.
A lot of users/communities are stuck because of network effects & the long history of posts (they still reference them, like living wikis)
These pipeline tools can help people migrate if they want without losing that history & network (my prediction is once people see what's possible with an open API, that'll further motivate user migration, or for twitter to open up again)
People should do a lot of things. They shouldn't smoke and they should work out, but here we are in 2024 where the phillip morris stock has outperformed ibm over the last 5 years and the obesity crisis shows no signs of letting up. Knowing something is bad is the first step of course but clearly not enough to drive a real behavioral change, and we aren't even fully in agreement as a society that social media is harmful like how cigarettes or obesity are known to be harmful.
Social media used to be about content sharing, but it's clear that it's far more profitable to keep your users on your site at all times and keep the content they create in house rather than linked somewhere else.
HN is probably the last true aggregator site left that I know of.
It's partly why the web is so much worse. There's really no reason to create content outside of a walled platform is it is getting increasingly difficult to find an audience for it. Even blogging is an uphill battle since more and more social media sites penalize link sharing (they want you to create the content on their platform and leave it there).
That's why it's not surprising that these APIs are disappearing since the fundamental model has changed.
But that's too sensible and user-friendly. It would be obviously weird and wrong if they inserted ads and Musk's tweets into someone's RSS feed. Yet showing those unwanted insertions is the whole purpose of the company now.
> What if I could ask patio’s archive: “what are some good books to read about [topic]” or “what advice would you give to someone trying to get a job at Stripe”
Or what if I could ask: "Given Omer Shehata's Twitter history, formulate a phishing scam that he would be likely vulnerable to".
The problem I see with here is that there are far more bad actor use cases for identifiable user data than good. In my opinion the main reason most social networks have stopped doing public by default and now do private by default is because not doing so opens them up to Cambridge Analytica type scandals where people don't realise what they're signing up for.
Personally if you do this, I would be very clear with your users that by submitting their data it will be made available publicly in an identifiable form. And that even if they revoke their data from your service it's possible for their data will continue to be archived by others, possibly for malicious reasons.
Cognitive security vulnerabilities like this are the thing I'm most concerned about. I think it's right to be very upfront about risks like these, and I'm even considering if we want to walk back the fully public thing and make it private / invite-only instead.
The whole point of the Internet up until about 2019 was that it was so cheap to host information it was basically free. If your site scaled up, you covered the hosting costs with ads or donations or something. The expensive part was finding content, so "user generated content" sites had to entice users to post stuff. That resulted in an implicit social contract with the users; these sites lived in fear of the users taking their content elsewhere.
Now the Web is expensive for some reason, and users have become so dependent on a small number of sites as a communications medium that the megacorporations running them feel like they have infinite leverage, and the social contract of user-generated content is completely forgotten.
I honestly don't know what to think of this time. On one hand it's sad in principle to see Reddit and Twitter lock down to the point they have.
But on the other hand they'd both already become cesspools by that time, and I was still visiting them daily. And now I've quit them both which is a good thing.
It has been difficult to rescue data from Twitter even before purchase. On our case it was relevant because this is online digital history for the people in my country.
The only thing we can is motivate more people to use open platforms like NOSTR where API or data/identity handling is completely different.
Funny, just now I've been playing around with the various tweet deleters and trying to get something working; presently I think I'm about to settle on something involving a basic screen macro recorder thing, like one of the iterations of AHK.
I'm somewhat surprised that this space feels relatively dormant compared to the more complex stuff out there.
Twitter was originally a microblogging service that had rss feeds to syndicate things or monitor the microblogs of people/companies that were interesting, it has gone way far off into the fields.
Same journey reddit is making, starting after it prepared to go public.
The only useful thing on Twitter that I ever saw was the lovely and tender Dog Rates https://x.com/dog_rates. You could read anonymously and be all aww and schucks about all those good dogs. They've thankfully stopped engaging with this cesspool that it became and moved somewhere else, Instagram perhaps? Somewhere where I can't read without an account, so I don't read it anymore.
While I have noticed an issue with certain voices seemingly being amplified a bit more.
It is hard to argue that Twitter is still where many of these conversations are happening. People already have their followers there and unless you have a big name that can easily get people to move over to something, it is going to be pretty hard to move.
Personally I have tried to rebuild on other platforms but the engagement keeps happening on twitter even when I make the same posts on the new alternatives.
Twitter is excellent if you're looking for a social network where you can consume content and interact with domain experts in various fields.
In my experience it's really only those who go there for politics that get annoyed by it – perviously it was the right-wingers who were angry because it was too left-wing, and now it's left-wingers who think it's too right-wing.
[+] [-] abdullahkhalids|1 year ago|reply
What is the game theory here? Twitter cooperates and OpenAI defects, and we call that a win?
[+] [-] OmarShehata|1 year ago|reply
It doesn't have to be one company controlling who gets access, the users can decide this
[+] [-] numpad0|1 year ago|reply
If companies making billions isn't the part that's problematic, the billions part can be left out, and the real problem can be discussed instead.
[+] [-] bravetraveler|1 year ago|reply
People pirate software, yet people still develop and sell it.
Yes, it's not the most profitable thing done alone. Only for those who find the nice combination or feedback loop of 'fit', demand, improvement, and expansion.
If you can make billions off the thing, you can presumably handle some GeoIP/rate limiting... or simply, not caring. Anything that falls through the cracks is categorically insignificant to your grand nature.
To justify it, if one must, consider it a trial. As your friendly neighborhood dealer would say: "The first taste is free".
[+] [-] rurp|1 year ago|reply
[+] [-] KerrAvon|1 year ago|reply
[+] [-] bangaladore|1 year ago|reply
I don't use Twitter often, but I'd run something like that if there were strong anonymity guarantees. Seems like a win-win for everyone.
Does anything like this exist today?
[+] [-] criticalfault|1 year ago|reply
[+] [-] yodsanklai|1 year ago|reply
[+] [-] OmarShehata|1 year ago|reply
These pipeline tools can help people migrate if they want without losing that history & network (my prediction is once people see what's possible with an open API, that'll further motivate user migration, or for twitter to open up again)
[+] [-] asdff|1 year ago|reply
[+] [-] throw0101a|1 year ago|reply
[+] [-] crystal_revenge|1 year ago|reply
HN is probably the last true aggregator site left that I know of.
It's partly why the web is so much worse. There's really no reason to create content outside of a walled platform is it is getting increasingly difficult to find an audience for it. Even blogging is an uphill battle since more and more social media sites penalize link sharing (they want you to create the content on their platform and leave it there).
That's why it's not surprising that these APIs are disappearing since the fundamental model has changed.
[+] [-] pavlov|1 year ago|reply
[+] [-] spondylosaurus|1 year ago|reply
[+] [-] kypro|1 year ago|reply
Or what if I could ask: "Given Omer Shehata's Twitter history, formulate a phishing scam that he would be likely vulnerable to".
The problem I see with here is that there are far more bad actor use cases for identifiable user data than good. In my opinion the main reason most social networks have stopped doing public by default and now do private by default is because not doing so opens them up to Cambridge Analytica type scandals where people don't realise what they're signing up for.
Personally if you do this, I would be very clear with your users that by submitting their data it will be made available publicly in an identifiable form. And that even if they revoke their data from your service it's possible for their data will continue to be archived by others, possibly for malicious reasons.
[+] [-] theexgenesis|1 year ago|reply
Cognitive security vulnerabilities like this are the thing I'm most concerned about. I think it's right to be very upfront about risks like these, and I'm even considering if we want to walk back the fully public thing and make it private / invite-only instead.
[+] [-] toomuchtodo|1 year ago|reply
https://github.com/TheExGenesis/community-archive
[+] [-] rasengan|1 year ago|reply
[+] [-] gary_0|1 year ago|reply
Now the Web is expensive for some reason, and users have become so dependent on a small number of sites as a communications medium that the megacorporations running them feel like they have infinite leverage, and the social contract of user-generated content is completely forgotten.
[+] [-] flykespice|1 year ago|reply
[+] [-] add-sub-mul-div|1 year ago|reply
But on the other hand they'd both already become cesspools by that time, and I was still visiting them daily. And now I've quit them both which is a good thing.
[+] [-] OmarShehata|1 year ago|reply
they've got data from 800 users so far, with watch data on 55 million videos
[+] [-] nunobrito|1 year ago|reply
The only thing we can is motivate more people to use open platforms like NOSTR where API or data/identity handling is completely different.
[+] [-] jrm4|1 year ago|reply
I'm somewhat surprised that this space feels relatively dormant compared to the more complex stuff out there.
(APIs suck)
[+] [-] molticrystal|1 year ago|reply
Same journey reddit is making, starting after it prepared to go public.
[+] [-] danielodievich|1 year ago|reply
[+] [-] vlod|1 year ago|reply
- https://x.com/contextdogs (out of context dogs)
- https://x.com/_B___S (B&S)
- https://x.com/buitengebieden (Buitengebieden)
[+] [-] KomoD|1 year ago|reply
No, they haven't. They're still actively posting on Twitter.
It's just that if you are viewing a profile logged out you get the most popular tweets instead of the most recent ones.
[+] [-] fb03|1 year ago|reply
[+] [-] qingcharles|1 year ago|reply
[+] [-] pornel|1 year ago|reply
The "independently" part is hard, could require some kind of web of trust.
[+] [-] bravetraveler|1 year ago|reply
[+] [-] electrondood|1 year ago|reply
[+] [-] xyst|1 year ago|reply
[+] [-] urda|1 year ago|reply
[+] [-] mrkramer|1 year ago|reply
[+] [-] ranger_danger|1 year ago|reply
yikes.
[+] [-] evv|1 year ago|reply
Many companies offer "object storage" which is compatible with the S3 API
And there are some open source projects such as Minio which offer the same (and I forget what are the others?)
[+] [-] OmarShehata|1 year ago|reply
[+] [-] allkindsof|1 year ago|reply
[deleted]
[+] [-] lenerdenator|1 year ago|reply
[deleted]
[+] [-] nerdjon|1 year ago|reply
It is hard to argue that Twitter is still where many of these conversations are happening. People already have their followers there and unless you have a big name that can easily get people to move over to something, it is going to be pretty hard to move.
Personally I have tried to rebuild on other platforms but the engagement keeps happening on twitter even when I make the same posts on the new alternatives.
[+] [-] llamaimperative|1 year ago|reply
[+] [-] kypro|1 year ago|reply
In my experience it's really only those who go there for politics that get annoyed by it – perviously it was the right-wingers who were angry because it was too left-wing, and now it's left-wingers who think it's too right-wing.
Personally I've noticed practically no change.