Wikipedia deprecates Archive.today, starts removing archive links
615 points| nobody9999 | 9 days ago |arstechnica.com
Archive.today is directing a DDoS attack against my blog - https://news.ycombinator.com/item?id=46843805 - Feb 2026 (168 comments)
Ask HN: Weird archive.today behavior? - https://news.ycombinator.com/item?id=46624740 - Jan 2026 (69 comments)
Some comments were deferred for faster rendering.
wuschel|9 days ago
How does the tech behind archive.today work in detail? Is there any information out there that goes beyond the Google AI search reply or this HN thread [2]?
[1] https://algustionesa.com/the-takedown-campaign-against-archi... [2] https://news.ycombinator.com/item?id=42816427
leonidasv|9 days ago
8cvor6j844qw_d6|9 days ago
archive.org also complies with takedown requests, so it's worth asking: could the organised campaign against archive.today have something to do with it preserving content that someone wants removed?
iamnothere|9 days ago
pyuser583|8 days ago
unknown|9 days ago
[deleted]
robotnikman|8 days ago
There is so much is archived there, to lose it all would be a tragedy.
ouhamouch|9 days ago
owner-archive-today . blogspot . com
2 years old, like J.P's first post on AT
bdhcuidbebe|9 days ago
unknown|9 days ago
[deleted]
celsoazevedo|9 days ago
With this said, I also disagree with turning everyone that uses archive[.]today into a botnet that DDoS sites. Changing the content of archived pages also raises questions about the authenticity of what we're reading.
The site behaves as if it was infected by some malware and the archived pages can't be trusted. I can see why Wikipedia made this decision.
fluoridation|9 days ago
It's very silly to talk about doxing when all someone has done is gather information anyone else can equally easily obtain, just given enough patience and time, especially when it's information the person in question put out there themselves. If it doesn't take any special skills or connections to obtain the information, but only the inclination to actually perform the research on publicly available data, I don't see what has been done that is unethical.
jsheard|9 days ago
jMyles|9 days ago
This is absolutely the buried lede of this whole saga, and needs to be the focus of conversation in the coming age.
Sophira|9 days ago
So it doesn't necessarily raise questions about whether the content has been changed or not. The difference is in whether that change is there to make the archive usable - and of course, for archive.today, that's not the case.
ddtaylor|9 days ago
cardanome|9 days ago
I don't think the DDOSing is a very good method for fighting back but I can't blame anyone for trying to survive. They are definitely the victim here.
If that blog really doxxed them out of idle curiosity they are an absolute piece of shit. Though I think this is more of a targeted campaign.
luxuryballs|8 days ago
nosamu|9 days ago
[1] https://archive.today/20240714173022/https://x.com/archiveis...
[2] https://x.com/advancedhosters
[3] https://x.com/advancedhosters/status/1731129170091004412
[4] https://lj.rossia.org/users/mopaiv/257.html
[5] https://x.com/advancedhosters/status/1501971277099286539
jeroenhd|8 days ago
If archive.whatever wasn't so useful to the general public, it'd be hard to distinguish from a criminal operation given the way it operates, unlike say the Internet Archive who goes through all of the proper legal paperwork to be a real nonprofit.
snigsnog|8 days ago
ChocMontePy|9 days ago
Every Reddit archived page used to have a Reddit username in the top right, but then it disappeared. "Fair enough," I thought. "They want to hide their Reddit username now."
The problem is, they did it retroactively too, removing the username from past captures.
You can see on old Reddit captures where the normal archived page has no username, but when you switch the tab to the Screenshot of the archive it is still there. The screenshot is the original capture and the username has now been removed for the normal webpage version.
When I noticed it, it seemed like such a minor change, but with these latest revelations, it doesn't seem so minor anymore.
palmotea|9 days ago
That doesn't seem nefarious, though. It makes sense they wouldn't want to reveal whatever accounts they use to bypass blocks, and the logged-in account isn't really meaningful content to an archive consumer.
Now, if they were changing the content of a reddit post or comment, that would be an entirely different matter.
unknown|8 days ago
[deleted]
basch|9 days ago
mroe https://en.wikipedia.org/wiki/Perma.cc
ronsor|9 days ago
Computer0|8 days ago
https://www.in.gov/nircc/planning/highway/traffic-data/inter...
and reddit blocks their agent seemingly. It is open source though.
jsheard|9 days ago
unknown|9 days ago
[deleted]
ouhamouch|9 days ago
[deleted]
karel-3d|9 days ago
https://archive-is.tumblr.com/post/806832066465497088/ladies...
https://archive-is.tumblr.com/post/807584470961111040/it-see...
ricardobeat|9 days ago
dmix|9 days ago
frenchtoast8|9 days ago
I think ArchiveBox[1] is the most popular. I will give it a shot, but it's a shame they don't support URL rewriting[2], which would be annoying for me. I read a lot of blog and news articles that are split across multiple pages, and it would be nice if that article's "next page" link was a link to the next archived page instead of the original URL.
1: https://archivebox.io/
2: https://github.com/ArchiveBox/ArchiveBox/discussions/1395
quinncom|9 days ago
Open source. Self hosted or managed. Native iOS and Android apps.
Its Content Scripts feature allows custom JS scripts that transform saved content, which could be used to do URL rewriting.
kseistrup|8 days ago
* https://omnom.zone/
* https://github.com/asciimoo/omnom
tonymet|9 days ago
https://en.wikipedia.org/wiki/Wikipedia:Archive.today_guidan...
cnst|8 days ago
They're basically recommending changing verifiable references that can easily be cross-checked and verified, to "printed on paper" sources that could likely never be verified by any other Wikipedian, and can easily be used to provide a falsification and bias that could go unnoticed for extended periods of time.
Honestly, that's all you need to know about Wikipedia.
The "altered" allegation is also disingenuous. The reason archive.org never works, is precisely because it doesn't alter the pages enough. There's no evidence that archive.today has altered any actual main content they've archived; altering the hidden fields, usernames and paywalls, as well as random presentation elements to make the page look properly, doesn't really count as "altered" in my book, yet that's precisely what the allegation amounts to.
1vuio0pswjnm7|9 days ago
archive.today is very popular on HN; the opaque, shortened URLs are promoted on HN every day
I can't use archive.today. I tried but gave up. Too many hassles. I might be in the minority but I know I'm not the only one. As it happens. I have not found any site that I cannot access without it
The most important issue with archive.today though is the person running it, their past and present behaviour. It speaks for itself
Whomever it is, they have lot of info about HN users' reading habits given that archive.today URLs are so heavily promoted by HN submitters, commenters and moderators
1vuio0pswjnm7|8 days ago
"Geolocation" as a justication is ambiguous
Why a need for geolocation
Geolocation can be used for multiple purposes
"DNS performance" is only one purpose
Other purposes might offer the user no benefit, and might even be undesirable for users
As a result, some users don't send EDNS subnet. It's always been optional to send it
Even public resolvers, third party DNS services, like Cloudflare, recognise the tradeoffs for users and allow users to avoid sending it. Popular DNS software makes compiling support for EDNS subnet optional
Archive.today wants/needs EDNS subnet so bad it tries to gather it using a tracking pixel or it tries to block users who dont send it, e.g., Cloudflare users
Thus, before one even considers all the other behaviour of this website operator, some of which is mentioned in this thread, there is a huge red flag for anyone who pays attention to EDNS subnet
As with almost all websites repeated DNS lookups are not an absolute requirement for successful HTTP requests
There are some IP addresses for archive.{today,is,md,ph,li,...} that have continued to work for years
belviewreview|9 days ago
wolvoleo|8 days ago
Anyone interested in the reading habits of HN users can just take a look at news.ycombinator.com ;)
diath|9 days ago
It's not promoted, it's just used as a paywall bypass so everyone can read the linked article.
fouc|9 days ago
bawolff|9 days ago
1vuio0pswjnm7|8 days ago
"promoted" as used here means placing an archive.tld URL at the top of an HN thread so that many HN readers will follow it, or placing these URLs elsewhere in threads
nobody9999|7 days ago
What hassles have you experienced?
I use the Archive Page[0] extension which is really easy to use.
The only thing that annoys me about it is the repeated requests (starting about eight or nine months ago) to complete CAPTCHAs.
[0] https://addons.mozilla.org/en-US/firefox/addon/archive-page/
rawling|8 days ago
Could you not in theory record the whole TLS transaction? Can it not be replayed later and re-verified?
Up until an old certificate leaks or is broken and you can fake anything "from back when it was valid", I guess.
arboles|8 days ago
justincormack|8 days ago
krick|9 days ago
Ultimately, what we all use it for is pretty straight-forward, and it seems like by now we should've arrived at having approximately one best implementation, which could be used both for personal archiving and for iternet-facing instances (perhaps even distributed). But I don't know if we have.
robotnikman|9 days ago
seanhly|8 days ago
On that occasion, the target of the attack was a site named northcountrygazette.org, whose owner seems to have never become aware of the attack. The HN commenter noted when they went to the site manually it was incredibly slow, which would suggest the DDoS attempt was effective.
I tried to see if there was anything North Country Gazette had published that the webmaster of archive.today might have taken issue with, and I couldn't find anything in particular. However, the "Gazette" had previously threatened readers with IP logging to prosecute paywall bypassers (https://news.slashdot.org/story/10/10/27/2134236/pay-or-else...), and also blocks archivers in its robots.txt file, indicating it is hostile towards archiving in general.
I can no longer access North Country Gazette, so perhaps it has since gone out of business. I found a few archived posts from its dead website complaining of high server fees. Like the target of this most recent DDoS, June Maxam, the lady behind North Country Gazette, also appears/appeared to be a sleuth.
ouhamouch|8 days ago
[deleted]
andai|9 days ago
This would have sounded Very Normal in the 2000s... I wonder if we can go back :)
bawolff|9 days ago
PhilipRoman|8 days ago
xurukefi|9 days ago
jsheard|9 days ago
If a site (or the WAF in front of it) knows what it's doing then you'll never be able to pass as Googlebot, period, because the canonical verification method is a DNS lookup dance which can only succeed if the request came from one of Googlebots dedicated IP addresses. Bingbot is the same.
Aurornis|9 days ago
The curious part is that they allow web scraping arbitrary pages on demand. So if a publisher could put in a lot of arbitrary requests to archive their own pages and see them all coming from a single account or small subset of accounts.
I hope they haven't been stealing cookies from actual users through a botnet or something.
tonymet|9 days ago
elzbardico|9 days ago
Why? in the world of web scrapping this is pretty common.
cnst|8 days ago
They bypass the rendering issues by "altering" the webpages. It's not uncommon to archive a page, and see nothing because of the paywalls; but then later on, the same page is silently fixed. They have a Tumblr where you can ask them questions; at one point, it's been quite common for everyone to ask them to fix random specific pages, which they did promptly.
Honestly, you cannot archive a modern page, unless you alter it. Yet they're now being attacked under the pretence of "altering" webpages, but that's never been a secret, and it's technologically impossible to archive without altering.
Cider9986|8 days ago
layer8|9 days ago
comeonbro|9 days ago
Anything on twitter post-login-wall for one. A million only-semi-paywalled news articles for others. But mainly an unfathomably long tail.
It was extremely distressing when the admin started(?) behaving badly for this reason. That others are starting to react this way to it is understandable. What a stupid tragedy.
_el1s7|8 days ago
croes|9 days ago
Hardly possible for Wikimedia to provide a service like archive.today given the legal trouble of the latter.
Strangely naive.
anilakar|9 days ago
From hero to a Kremlin troll in five seconds.
alfiedotwtf|8 days ago
nubinetwork|9 days ago
stephen_g|9 days ago
Not sure why it would only be on archive.is and not the others but ‘is’ loads for me.
jl6|8 days ago
Gander5739|8 days ago
cnst|8 days ago
BTW, they also alter paywalls and other elements, because otherwise, many websites won't show the main content these days.
It kind of seems like "altered" is the new "hacker" today?
mrguyorama|9 days ago
Oh good. That's definitely a reasonable thing to do or think.
The raw sociopathy of some people. Getting doxxed isn't good, but this response is unhinged.
jMyles|9 days ago
We live at a moment where it's trivially easy to frame possession of an unsavory (or even illegal) number on another person's storage media, without that person even realizing (and possibly, with some WebRTC craftiness and social engineering, even get them to pass on the taboo payload to others).
oytis|9 days ago
ouhamouch|9 days ago
In response to J.P's blog already framed AT as project grown from a carding forum + pushed his speculations onto ArsTechnica, whose parent company just destroyed 12ft and is on to a new victim. The story is full of untold conflicts of interests covered with soap opera around DDoS.
tetris11|9 days ago
ValentineC|8 days ago
bjourne|9 days ago
super256|9 days ago
Unfortunately this happens more often than one would expect.
I found this out when I preserved my very first homepage I made as a child on a free hosting service. I archived it on archive.org, and thought it would stay there forever. Then, in 2017 the free host changed the robots.txt, closed all services, and my treasured memory was forever gone from the internet. ;(
unknown|9 days ago
[deleted]
unknown|8 days ago
[deleted]
anovikov|8 days ago
rdiddly|9 days ago
That effort appears to have gone nowhere, so now suddenly archive.today commits reputational suicide? I don't suppose someone could look deeper into this please?
ndiddy|9 days ago
> Regarding the FBI’s request, my understanding is that they were seeking some form of offline action from us — anything from a witness statement (“Yes, this page was saved at such-and-such a time, and no one has accessed or modified it since”) to operational work involving a specific group of users. These users are not necessarily associates of Epstein; among our users who are particularly wary of the FBI, there are also less frequently mentioned groups, such as environmental activists or right-to-repair advocates.
> Since no one was physically present in the United States at that time, however, the matter did not progress further.
> You already know who turned this request into a full-blown panic about “the FBI accusing the archive and preparing to confiscate everything.”
Not sure who he's talking about there.
chrisjj|9 days ago
Oh? Do tell!
nobody9999|9 days ago
>Oh? Do tell!
They do. In the very next paragraph in fact:
that_lurker|9 days ago
eviks|8 days ago
You're part of the community! Prove him right!
RupertSalt|9 days ago
dakolli|8 days ago
unknown|9 days ago
[deleted]
realaaa|7 days ago
---------
Here’s the chronology that the HN thread id=47092006 is about, based on the linked Ars Technica article and related sources.
---
## 1. What “started the argument”?
The core dispute starts from a 2023 blog post by engineer Jani Patokallio on his site Gyrovague, investigating who is behind archive.today. That post, plus later FBI interest, led to:
1. A *GDPR/takedown campaign* against the blog post. 2. An *apparent DDoS* launched from archive.today’s CAPTCHA page against his blog. 3. *Threats* from the archive.today operator (“Nora”) to associate Patokallio’s name with AI porn and other harassment. 4. *Discovery that archive.today had altered archived pages* to insert Patokallio’s name. 5. A *Wikipedia RfC* and decision to deprecate and blacklist archive.today links.
The Hacker News thread you referenced is about the final step: Wikipedia’s decision to remove ~695,000 archive.today links.
---
## 2. Timeline of the situation
```mermaid timeline title archive.today – Wikipedia controversy chronology
```So, in the terms of your question:
- *What started the argument* was Patokallio’s 2023 investigation into archive.today’s ownership, which later coverage of the FBI subpoena amplified. - The *direct trigger for Wikipedia’s action* was the combination of: - The *DDoS* launched from archive.today against his blog. - The *threats* (AI porn, harassment) against him. - Evidence that the *archive’s content had been tampered with*, violating Wikipedia’s trust in it as a citation source.【turn4fetch0】【turn9find1】
ValveFan6969|8 days ago
[deleted]
ValveFan6969|9 days ago
[deleted]
Keekgette|8 days ago
[deleted]
attila-lendvai|9 days ago
[deleted]
Permit|9 days ago
What possible value could a comment from someone who has no knowledge of the site or conflict add to this discussion?
ChrisArchitect|9 days ago
[deleted]
input_sh|9 days ago
casey2|9 days ago
oytis|8 days ago
snigsnog|8 days ago
TZubiri|9 days ago
Onavo|9 days ago
tl2do|9 days ago
bawolff|9 days ago
AgentME|9 days ago
ranger207|9 days ago
shevy-java|9 days ago
zeroonetwothree|9 days ago
alsetmusic|9 days ago
jraph|9 days ago
Larrikin|9 days ago
unknown|9 days ago
[deleted]
selridge|9 days ago
[deleted]
kmeisthax|9 days ago
[deleted]
paganel|9 days ago