Wikipedia deprecates Archive.today, starts removing archive links

wuschel|9 days ago

There is an post describing the possibility of an organised campaign against archive.today [1] https://algustionesa.com/the-takedown-campaign-against-archi...

How does the tech behind archive.today work in detail? Is there any information out there that goes beyond the Google AI search reply or this HN thread [2]?

[1] https://algustionesa.com/the-takedown-campaign-against-archi... [2] https://news.ycombinator.com/item?id=42816427

leonidasv|9 days ago

If they're under an organised defamation campaign, they're not helping themselves by DDoSing someone else's blog and editing archived pages.

8cvor6j844qw_d6|9 days ago

archive.today works surprisingly well for me, often succeeding where archive.org fails.

archive.org also complies with takedown requests, so it's worth asking: could the organised campaign against archive.today have something to do with it preserving content that someone wants removed?

iamnothere|9 days ago

There was also the recent news about sites beginning to block the Internet Archive. Feels like we are gearing up for the next phase of the information war.

pyuser583|8 days ago

Was that written by AI? It sounds like AI, spends lots of time summarizing other posts, and has no listed author. My AI alarm is going off.

unknown|9 days ago

[deleted]

robotnikman|8 days ago

A big fear of mine is something happening to archive.is

There is so much is archived there, to lose it all would be a tragedy.

ouhamouch|9 days ago

There are number of blog posts like

owner-archive-today . blogspot . com

2 years old, like J.P's first post on AT

bdhcuidbebe|9 days ago

They are able to scrape paywalled sites at random, so im guessing a residential botnet is used.

unknown|9 days ago

[deleted]

celsoazevedo|9 days ago

I don't see the point in doxing anyone, especially those providing a useful service for the average internet user. Just because you can put some info together, it doesn't mean you should.

With this said, I also disagree with turning everyone that uses archive[.]today into a botnet that DDoS sites. Changing the content of archived pages also raises questions about the authenticity of what we're reading.

The site behaves as if it was infected by some malware and the archived pages can't be trusted. I can see why Wikipedia made this decision.

fluoridation|9 days ago

For a very brief time, "doxing" (that is, dropping dox, that is, dropping docs, or documents) used to mean something useful. You gathered information that was not out in public, for example by talking to people or by stealing it, and put it out in the open.

It's very silly to talk about doxing when all someone has done is gather information anyone else can equally easily obtain, just given enough patience and time, especially when it's information the person in question put out there themselves. If it doesn't take any special skills or connections to obtain the information, but only the inclination to actually perform the research on publicly available data, I don't see what has been done that is unethical.

jsheard|9 days ago

It's also kind of ironic that a site whose whole premise is to preserve pages forever, whether the people involved like it or not, is seeking to take down another site because they are involved and don't like it. Live by the sword, etc.

jMyles|9 days ago

> Changing the content of archived pages also raises questions about the authenticity of what we're reading.

This is absolutely the buried lede of this whole saga, and needs to be the focus of conversation in the coming age.

Sophira|9 days ago

Sites that exist to archive other websites will almost always need to dynamically change the content of the HTML that they're serving in some way or another. (For example, a link that points to the root of the website may need changed in order to point to the right location.)

So it doesn't necessarily raise questions about whether the content has been changed or not. The difference is in whether that change is there to make the archive usable - and of course, for archive.today, that's not the case.

ddtaylor|9 days ago

Did they actually run the DDoS via a script or was this a case of inserting a link and many users clicked it? They are substantially different IMO

cardanome|9 days ago

As far as I understand the person behind archive.today might face jail time if they are found out. You shouldn't be surprised that people lash out when you threaten their life.

I don't think the DDOSing is a very good method for fighting back but I can't blame anyone for trying to survive. They are definitely the victim here.

If that blog really doxxed them out of idle curiosity they are an absolute piece of shit. Though I think this is more of a targeted campaign.

luxuryballs|8 days ago

this seems like type of thing that should be on blockchain and decentralized nodes validate authenticity, it could support revisions but not lose originals

nosamu|9 days ago

Has anyone else noticed that some of Archive.today's X/Twitter captures [1] are logged in with an account called "advancedhosters" [2], which is associated with a web hosting company apparently located in Cyprus? The latest post [3] from the account links to a blog post [4] including private communications between the webmaster of Archive.today (using their previously-known "Volth" alias) and a site owner requesting a takedown. Also note that the previous post [5] from the "advancedhosters" account was a link to a pro-Russia, anti-Ukraine article, archived via Archive.today of course. Seems like an interesting lead to untangle.

[1] https://archive.today/20240714173022/https://x.com/archiveis...

[2] https://x.com/advancedhosters

[3] https://x.com/advancedhosters/status/1731129170091004412

[4] https://lj.rossia.org/users/mopaiv/257.html

[5] https://x.com/advancedhosters/status/1501971277099286539

jeroenhd|8 days ago

It could be a donated account. I've noticed archive.whatever also bypasses some paywalls by using legitimate account logins but I doubt there's one person going around subscribing to every news outlet that gets any coverage.

If archive.whatever wasn't so useful to the general public, it'd be hard to distinguish from a criminal operation given the way it operates, unlike say the Internet Archive who goes through all of the proper legal paperwork to be a real nonprofit.

snigsnog|8 days ago

Lead to what?

ChocMontePy|9 days ago

I noticed last year that some archived pages are getting altered.

Every Reddit archived page used to have a Reddit username in the top right, but then it disappeared. "Fair enough," I thought. "They want to hide their Reddit username now."

The problem is, they did it retroactively too, removing the username from past captures.

You can see on old Reddit captures where the normal archived page has no username, but when you switch the tab to the Screenshot of the archive it is still there. The screenshot is the original capture and the username has now been removed for the normal webpage version.

When I noticed it, it seemed like such a minor change, but with these latest revelations, it doesn't seem so minor anymore.

palmotea|9 days ago

> When I noticed it, it seemed like such a minor change, but with these latest revelations, it doesn't seem so minor anymore.

That doesn't seem nefarious, though. It makes sense they wouldn't want to reveal whatever accounts they use to bypass blocks, and the logged-in account isn't really meaningful content to an archive consumer.

Now, if they were changing the content of a reddit post or comment, that would be an entirely different matter.

unknown|8 days ago

[deleted]

basch|9 days ago

It seems a lot of people havent heard of it, but I think its worth plugging https://perma.cc/ which is really the appropriate tool for something like Wikipedia to be using to archive pages.

mroe https://en.wikipedia.org/wiki/Perma.cc

ronsor|9 days ago

It costs money beyond 10 links, which means either a paid subscription or institutional affiliation. This is problematic for an encyclopedia anyone can edit, like Wikipedia.

Computer0|8 days ago

I switched to Perma.cc earlier this week and have had a mixed experience to say the least. I think image heavy pages just error out completely, while still charging me such as:

https://www.in.gov/nircc/planning/highway/traffic-data/inter...

and reddit blocks their agent seemingly. It is open source though.

jsheard|9 days ago

Does Wikipedia really need to outsource this? They already do basically everything else in-house, even running their own CDN on bare metal, I'm sure they could spin up an archiver which could be implicitly trusted. Bypassing paywalls would be playing with fire though.

unknown|9 days ago

[deleted]

ouhamouch|9 days ago

[deleted]

karel-3d|9 days ago

Archive.is is now publishing really weird posts on their Tumblr blog, related to the whole thing

https://archive-is.tumblr.com/post/806832066465497088/ladies...

https://archive-is.tumblr.com/post/807584470961111040/it-see...

ricardobeat|9 days ago

The word salad with ukraine, arms trade, nazis, hunter biden, leave no doubt the operator is from Russia.

dmix|9 days ago

He’s probably being purposefully vague which makes for difficult reading.

frenchtoast8|9 days ago

A bit off topic, but are there any self hosted open source archiving servers people are using for personal usage?

I think ArchiveBox[1] is the most popular. I will give it a shot, but it's a shame they don't support URL rewriting[2], which would be annoying for me. I read a lot of blog and news articles that are split across multiple pages, and it would be nice if that article's "next page" link was a link to the next archived page instead of the original URL.

1: https://archivebox.io/

2: https://github.com/ArchiveBox/ArchiveBox/discussions/1395

quinncom|9 days ago

I like Readeck – https://codeberg.org/readeck/readeck

Open source. Self hosted or managed. Native iOS and Android apps.

Its Content Scripts feature allows custom JS scripts that transform saved content, which could be used to do URL rewriting.

kseistrup|8 days ago

Omnom comes to mind:

* https://omnom.zone/

* https://github.com/asciimoo/omnom

tonymet|9 days ago

Wikipedia's own page on this topic is much more succinct about the context and change in policy

https://en.wikipedia.org/wiki/Wikipedia:Archive.today_guidan...

cnst|8 days ago

> Change the original source to something that doesn't need an archive (e.g., a source that was printed on paper), or for which a link to an archive is only a matter of convenience.

They're basically recommending changing verifiable references that can easily be cross-checked and verified, to "printed on paper" sources that could likely never be verified by any other Wikipedian, and can easily be used to provide a falsification and bias that could go unnoticed for extended periods of time.

Honestly, that's all you need to know about Wikipedia.

The "altered" allegation is also disingenuous. The reason archive.org never works, is precisely because it doesn't alter the pages enough. There's no evidence that archive.today has altered any actual main content they've archived; altering the hidden fields, usernames and paywalls, as well as random presentation elements to make the page look properly, doesn't really count as "altered" in my book, yet that's precisely what the allegation amounts to.

1vuio0pswjnm7|9 days ago

https://web.archive.org/web/20260220191245if_/https://arstec...

archive.today is very popular on HN; the opaque, shortened URLs are promoted on HN every day

I can't use archive.today. I tried but gave up. Too many hassles. I might be in the minority but I know I'm not the only one. As it happens. I have not found any site that I cannot access without it

The most important issue with archive.today though is the person running it, their past and present behaviour. It speaks for itself

Whomever it is, they have lot of info about HN users' reading habits given that archive.today URLs are so heavily promoted by HN submitters, commenters and moderators

1vuio0pswjnm7|8 days ago

Archive.today wants/needs EDNS subnet

"Geolocation" as a justication is ambiguous

Why a need for geolocation

Geolocation can be used for multiple purposes

"DNS performance" is only one purpose

Other purposes might offer the user no benefit, and might even be undesirable for users

As a result, some users don't send EDNS subnet. It's always been optional to send it

Even public resolvers, third party DNS services, like Cloudflare, recognise the tradeoffs for users and allow users to avoid sending it. Popular DNS software makes compiling support for EDNS subnet optional

Archive.today wants/needs EDNS subnet so bad it tries to gather it using a tracking pixel or it tries to block users who dont send it, e.g., Cloudflare users

Thus, before one even considers all the other behaviour of this website operator, some of which is mentioned in this thread, there is a huge red flag for anyone who pays attention to EDNS subnet

As with almost all websites repeated DNS lookups are not an absolute requirement for successful HTTP requests

There are some IP addresses for archive.{today,is,md,ph,li,...} that have continued to work for years

belviewreview|9 days ago

I use archive.today all the time. How do you access pages, like for instance on the economist, without it?

wolvoleo|8 days ago

> Whomever it is, they have lot of info about HN users' reading habits given that archive.today URLs are so heavily promoted by HN submitters, commenters and moderators

Anyone interested in the reading habits of HN users can just take a look at news.ycombinator.com ;)

diath|9 days ago

> Whomever it is, they have lot of info about HN users' reading habits given that archive.today URLs are so heavily promoted by HN submitters, commenters and moderators

It's not promoted, it's just used as a paywall bypass so everyone can read the linked article.

fouc|9 days ago

you can change the tld of any archive.today link if .today doesn't work. for example archive.ph, archive.is, archive.md, etc

bawolff|9 days ago

The fact is i cant have a discussion about a paywalled article without reading it. Archive.today is popular as a paywall bypass because nobody wants HN to devolve into debate based on a headline where nobody has rtfa.

1vuio0pswjnm7|8 days ago

"archive.today" as used here means the collection of archive.tld domains, where .tld could be ".is", ".md", ".ph", etc.

"promoted" as used here means placing an archive.tld URL at the top of an HN thread so that many HN readers will follow it, or placing these URLs elsewhere in threads

nobody9999|7 days ago

>I can't use archive.today. I tried but gave up. Too many hassles.

What hassles have you experienced?

I use the Archive Page[0] extension which is really easy to use.

The only thing that annoys me about it is the repeated requests (starting about eight or nine months ago) to complete CAPTCHAs.

[0] https://addons.mozilla.org/en-US/firefox/addon/archive-page/

rawling|8 days ago

Is it not possible to create a non-repudiable archive of what a website served, when, entirely locally i.e. not relying on some third party site who might disappear or turn out to be unreliable?

Could you not in theory record the whole TLS transaction? Can it not be replayed later and re-verified?

Up until an old certificate leaks or is broken and you can fake anything "from back when it was valid", I guess.

arboles|8 days ago

I don't know, but archive sites could at least publish hashes of the content at archive time. This could be used to prove an archive wasn't tampered with later. I'm pretty underwhelmed by the Wayback Machine (archive.org), it's no better technically than archive.today.

justincormack|8 days ago

Unfortunately you can't usefully replay TLS and be able to validate it, so no that does not work. Best strategy would probably be a public transparency log, but websites are pretty variable and dynamic so this would be unlikely to work for many.

krick|9 days ago

I believe there are multiple options with different degree of "half-baked"-ness, but can anyone name the best self-hosted version of this service?

Ultimately, what we all use it for is pretty straight-forward, and it seems like by now we should've arrived at having approximately one best implementation, which could be used both for personal archiving and for iternet-facing instances (perhaps even distributed). But I don't know if we have.

robotnikman|9 days ago

I'm wondering the same thing, would be great to have something similar for personal use

seanhly|8 days ago

Curiously, this isn't the first time archive.today was implicated in a DDoS. A HN post from three years back shows some pasted snippets of similar XmlHttpRequest code running on archive.ph (an archive.today synonym site). Post link: https://news.ycombinator.com/item?id=38233062

On that occasion, the target of the attack was a site named northcountrygazette.org, whose owner seems to have never become aware of the attack. The HN commenter noted when they went to the site manually it was incredibly slow, which would suggest the DDoS attempt was effective.

I tried to see if there was anything North Country Gazette had published that the webmaster of archive.today might have taken issue with, and I couldn't find anything in particular. However, the "Gazette" had previously threatened readers with IP logging to prosecute paywall bypassers (https://news.slashdot.org/story/10/10/27/2134236/pay-or-else...), and also blocks archivers in its robots.txt file, indicating it is hostile towards archiving in general.

I can no longer access North Country Gazette, so perhaps it has since gone out of business. I found a few archived posts from its dead website complaining of high server fees. Like the target of this most recent DDoS, June Maxam, the lady behind North Country Gazette, also appears/appeared to be a sleuth.

ouhamouch|8 days ago

[deleted]

andai|9 days ago

Sounds like there's a gap in the market for a "commons" archive... maybe powered by something p2p like BitTorrent protocol?

This would have sounded Very Normal in the 2000s... I wonder if we can go back :)

bawolff|9 days ago

P2p is generally bad for this usecase. P2P generally only works for keeping popular content around (content gets dropped when the last peer that cares disconnects). If the content was popular it wouldnt need to be archived in the first place.

PhilipRoman|8 days ago

IMO there is actually a very low hanging fruit here, even without P2P or DHTs we could have an URI scheme that consists of a domain and document hash. It is then up to the user to add alternate mirrors for domains. Aside from privacy, it doesn't really matter who answers these requests since the documents are self-signing.

xurukefi|9 days ago

Kinda off-topic, but has anyone figured out how archive.today manages to bypass paywalls so reliably? I've seen people claiming that they have a bunch of paid accounts that they use to fetch the pages, which is, of course, ridiculous. I figured that they have found an (automated) way to imitate Googlebot really well.

jsheard|9 days ago

> I figured that they have found an (automated) way to imitate Googlebot really well.

If a site (or the WAF in front of it) knows what it's doing then you'll never be able to pass as Googlebot, period, because the canonical verification method is a DNS lookup dance which can only succeed if the request came from one of Googlebots dedicated IP addresses. Bingbot is the same.

Aurornis|9 days ago

> I've seen people claiming that they have a bunch of paid accounts that they use to fetch the pages, which is, of course, ridiculous.

The curious part is that they allow web scraping arbitrary pages on demand. So if a publisher could put in a lot of arbitrary requests to archive their own pages and see them all coming from a single account or small subset of accounts.

I hope they haven't been stealing cookies from actual users through a botnet or something.

tonymet|9 days ago

I’m an outsider with experience building crawlers. You can get pretty far with residential proxies and browser fingerprint optimization. Most of the b-tier publishers use RBC and heuristics that can be “worked around” with moderate effort.

elzbardico|9 days ago

> which is, of course, ridiculous.

Why? in the world of web scrapping this is pretty common.

cnst|8 days ago

It's because it's actively maintained, and bypassing the paywalls is its whole selling point, thus, they do have to be good at it.

They bypass the rendering issues by "altering" the webpages. It's not uncommon to archive a page, and see nothing because of the paywalls; but then later on, the same page is silently fixed. They have a Tumblr where you can ask them questions; at one point, it's been quite common for everyone to ask them to fix random specific pages, which they did promptly.

Honestly, you cannot archive a modern page, unless you alter it. Yet they're now being attacked under the pretence of "altering" webpages, but that's never been a secret, and it's technologically impossible to archive without altering.

Cider9986|8 days ago

I imagine accounts are the only way that archive.today works on sites like 404media.co that seem to have server sided paywalls. Similarly, twitter has a completely server sided paywall.

layer8|9 days ago

It’s not reliable, in the sense that there are many paywalled sites that it’s unable to archive.

comeonbro|9 days ago

There is an enormous amount of stuff that is only on archive.today, including stuff that is otherwise gone forever. A mix of stuff that somebody only ever did archive.today on and not archive.org, and stuff that could only be archived on archive.today because archive.org fails on it.

Anything on twitter post-login-wall for one. A million only-semi-paywalled news articles for others. But mainly an unfathomably long tail.

It was extremely distressing when the admin started(?) behaving badly for this reason. That others are starting to react this way to it is understandable. What a stupid tragedy.

_el1s7|8 days ago

Just went into a rabbit hole looking into this, wow, can't tell if this is just another drama on the weird wide web or something else.

croes|9 days ago

> “I’m glad the Wikipedia community has come to a clear consensus, and I hope this inspires the Wikimedia Foundation to look into creating its own archival service,” he told us.

Hardly possible for Wikimedia to provide a service like archive.today given the legal trouble of the latter.

Strangely naive.

anilakar|9 days ago

> If you want to pretend this never happened – delete your old article and post the new one you have promised. And I will not write “an OSINT investigation” on your Nazi grandfather

From hero to a Kremlin troll in five seconds.

alfiedotwtf|8 days ago

It would be nice if there was a non-dynamic snapshot archive as well as the page itself. That way, if the loaded JavaScript stops causes it to stop rendering, at least there’ll be a static fallback

nubinetwork|9 days ago

I noticed I've started being redirected to a blank nginx server for archive.is... but only the .is domain, .ph and .today work just fine. I wonder if they ended up on an adblocker or two.

stephen_g|9 days ago

There was some beef the site owner had with Cloudflare where if your were using Cloudflare DNS it wouldn’t serve anything to you? Is that still happening?

Not sure why it would only be on archive.is and not the others but ‘is’ loads for me.

jl6|8 days ago

Am I reading this right… they tampered with an archived page and then changed it back? How do we know? Is there another archive site that has before and after proof?

Gander5739|8 days ago

See https://en.wikipedia.org/wiki/Wikipedia%3ARequests_for_comme...

cnst|8 days ago

They've changed usernames they use to post under. That's the only "altered" allegation they've been accused of.

BTW, they also alter paywalls and other elements, because otherwise, many websites won't show the main content these days.

It kind of seems like "altered" is the new "hacker" today?

mrguyorama|9 days ago

>In emails sent to Patokallio after the DDoS began, “Nora” from Archive.today threatened to create a public association between Patokallio’s name and AI porn and to create a gay dating app with Patokallio’s name.

Oh good. That's definitely a reasonable thing to do or think.

The raw sociopathy of some people. Getting doxxed isn't good, but this response is unhinged.

jMyles|9 days ago

It's a reminder how fragile and tenuous are the connections between our browser/client outlays, our societal perceptions of online norms, and our laws.

We live at a moment where it's trivially easy to frame possession of an unsavory (or even illegal) number on another person's storage media, without that person even realizing (and possibly, with some WebRTC craftiness and social engineering, even get them to pass on the taboo payload to others).

oytis|9 days ago

I mean, the admin of archive.today might face jail time if deanonymised, kind of understandable he's nervous. Meanwhile for Patokallio it's just curiosity and clicks

ouhamouch|9 days ago

That was private negotiations, btw, not public statements.

In response to J.P's blog already framed AT as project grown from a carding forum + pushed his speculations onto ArsTechnica, whose parent company just destroyed 12ft and is on to a new victim. The story is full of untold conflicts of interests covered with soap opera around DDoS.

tetris11|9 days ago

Archive.today's domain registrar is Tucows for anyone wondering

ValentineC|8 days ago

Just curious: is this of any significance?

bjourne|9 days ago

FYI, archive.today is NOT the Internet Archive/Wayback Machine.

super256|9 days ago

I prefer archive.today because the Internet Archive’s Wayback Machine allows retrospective removals of archived pages. If a URL has already been crawled and archived, the site owner can later add that URL to robots.txt and request a re-crawl. Once the crawler detects the updated robots.txt, previously stored snapshots of that page can become inaccessible, even if they were captured before the rule was added.

Unfortunately this happens more often than one would expect.

I found this out when I preserved my very first homepage I made as a child on a free hosting service. I archived it on archive.org, and thought it would stay there forever. Then, in 2017 the free host changed the robots.txt, closed all services, and my treasured memory was forever gone from the internet. ;(

unknown|9 days ago

[deleted]

unknown|8 days ago

[deleted]

anovikov|8 days ago

It doesn't work properly anyway anymore...

rdiddly|9 days ago

So toward the end of last year, the FBI was after archive.today, presumably either for keeping track of things the current administration doesn't want tracked, or maybe for the paywall thing (on behalf of rich donors/IP owners). https://gizmodo.com/the-fbi-is-trying-to-unmask-the-registra...

That effort appears to have gone nowhere, so now suddenly archive.today commits reputational suicide? I don't suppose someone could look deeper into this please?

ndiddy|9 days ago

The archive.today operator claims on his blog that this was nothing major: https://lj.rossia.org/users/archive_today/

> Regarding the FBI’s request, my understanding is that they were seeking some form of offline action from us — anything from a witness statement (“Yes, this page was saved at such-and-such a time, and no one has accessed or modified it since”) to operational work involving a specific group of users. These users are not necessarily associates of Epstein; among our users who are particularly wary of the FBI, there are also less frequently mentioned groups, such as environmental activists or right-to-repair advocates.

> Since no one was physically present in the United States at that time, however, the matter did not progress further.

> You already know who turned this request into a full-blown panic about “the FBI accusing the archive and preparing to confiscate everything.”

Not sure who he's talking about there.

chrisjj|9 days ago

> an analysis of existing links has shown that most of its uses can be replaced.

Oh? Do tell!

nobody9999|9 days ago

>> an analysis of existing links has shown that most of its uses can be replaced.

>Oh? Do tell!

They do. In the very next paragraph in fact:

   The guidance says editors can remove Archive.today links when the original 
   source is still online and has identical content; replace the archive link so 
   it points to a different archive site, like the Internet Archive, 
   Ghostarchive, or Megalodon; or “change the original source to something that 
   doesn’t need an archive (e.g., a source that was printed on paper)

that_lurker|9 days ago

I would be suprised if archive.today had something that was not in the wayback machine

eviks|8 days ago

> the community should figure out how to efficiently remove links to archive.today

You're part of the community! Prove him right!

RupertSalt|9 days ago

"Non-paywalled" ad-free link to archive: https://en.wikipedia.org/wiki/Wikipedia:Requests_for_comment...

dakolli|8 days ago

The FBI called out archive.today a couple months ago, there's clearly a campaign against them by the USA (4th Reich), which stands principally against any information repository they don't control or have influence over (its Russian owned). This is simply donors of the Trump regime who own media companies requesting this because its the primary way around paywalls for most people who know about it.

unknown|9 days ago

[deleted]

realaaa|7 days ago

wow! but this felt like end of the story - here is LLM summary of timeline - sharing as is

---------

Here’s the chronology that the HN thread id=47092006 is about, based on the linked Ars Technica article and related sources.

---

## 1. What “started the argument”?

The core dispute starts from a 2023 blog post by engineer Jani Patokallio on his site Gyrovague, investigating who is behind archive.today. That post, plus later FBI interest, led to:

1. A *GDPR/takedown campaign* against the blog post. 2. An *apparent DDoS* launched from archive.today’s CAPTCHA page against his blog. 3. *Threats* from the archive.today operator (“Nora”) to associate Patokallio’s name with AI porn and other harassment. 4. *Discovery that archive.today had altered archived pages* to insert Patokallio’s name. 5. A *Wikipedia RfC* and decision to deprecate and blacklist archive.today links.

The Hacker News thread you referenced is about the final step: Wikipedia’s decision to remove ~695,000 archive.today links.

---

## 2. Timeline of the situation

```mermaid timeline title archive.today – Wikipedia controversy chronology

    2012-2015 : Site founded as archive.is; later branded archive.today
    2023-08-05 : Patokallio publishes investigation into archive.today’s ownership
    2025-10-30 : FBI subpoena to archive.today’s registrar (Tucows)
    2025-11-05 : Heise reports FBI subpoena, links to Patokallio’s 2023 post
    2026-01-08 : GDPR complaint from “Nora” to Automattic re Patokallio’s post
    2026-01-10 : archive.today webmaster emails Patokallio asking for temporary takedown
    2026-01-11 : DDoS from archive.today CAPTCHA page against Gyrovague begins
    2026-01-14 : First public HN report about weird/DDoS behavior from archive.today
    2026-01-21 : gyrovague.com added to DNS blocklists used by ad blockers
    2026-01-25 : Email exchange escalates; “Nora” threatens AI porn, “gay dating app”, “Nazi grandfather”
    2026-02-01 : Patokallio publishes detailed timeline and DDoS disclosure
    2026-02-07 : Wikipedia RfC opens on archive.today links
    2026-02-10 : Ars Technica reports on DDoS and Wikipedia considering blacklist
    2026-02-19 : DDoS code still present in archive.today CAPTCHA page (per Wikipedia guidance)
    2026-02-20 : RfC closed; consensus to deprecate/blacklist archive.today
    2026-02-20–21 : Major outlets report Wikipedia’s blacklist; guidance page created

```

So, in the terms of your question:

- *What started the argument* was Patokallio’s 2023 investigation into archive.today’s ownership, which later coverage of the FBI subpoena amplified. - The *direct trigger for Wikipedia’s action* was the combination of: - The *DDoS* launched from archive.today against his blog. - The *threats* (AI porn, harassment) against him. - Evidence that the *archive’s content had been tampered with*, violating Wikipedia’s trust in it as a citation source.【turn4fetch0】【turn9find1】

ValveFan6969|8 days ago

[deleted]

ValveFan6969|9 days ago

[deleted]

Keekgette|8 days ago

[deleted]

attila-lendvai|9 days ago

[deleted]

Permit|9 days ago

> i don't know anything specific about the site or any conflicts involved, yet this smells like a negative PR campaign to me...

What possible value could a comment from someone who has no knowledge of the site or conflict add to this discussion?

ChrisArchitect|9 days ago

[deleted]

input_sh|9 days ago

I know I'm arguing with a bot that nobody monitors, but it's already in the fucking post.

casey2|9 days ago

Anecdotally I generally see archive.is/archive.today links floating around "stochastic terrorist" sites and other hate cults.

oytis|8 days ago

I see them everywhere where paywalled content is referenced

snigsnog|8 days ago

Shows that it's a great archival service if the most censored people are able to use it without their archives being censored.

TZubiri|9 days ago

They seem totally unrelated to the Internet Archive. They probably only ever got on Wikipedia by leeching of the IA brand and confusing enough people to use them

Onavo|9 days ago

Wayback machine won't bypass paywall nor pirate content, not to mention they are under US jurisdiction. You can't have your cake and eat it.

tl2do|9 days ago

Why not show both? Wikipedia could display archive links alongside original sources, clearly labeled so readers know which is which. This preserves access when originals disappear while keeping the primary source as the main reference.

bawolff|9 days ago

The objection is to this specific archieve service not archiving in general.

AgentME|9 days ago

Wikipedia shouldn't allow links to sites which intentionally falsify archived pages and use their visitors to perform DDOS attacks.

ranger207|9 days ago

They generally do. Random example, citation 349 on the page of George Washington: ""A Brief History of GW"[link]. GW Libraries. Archived[link] from the original on September 14, 2019. Retrieved August 19, 2019."

shevy-java|9 days ago

Anyone has a short summary as to who and why Archive.today acted via DDos? Isn't that something done by malicious actors? Or did others misuse Archive.today?

zeroonetwothree|9 days ago

If you read the linked article it is discussed

alsetmusic|9 days ago

I will no longer donate to Wikipedia as long as this is policy.

jraph|9 days ago

Why? The decision seems reasonable at first sight.

Larrikin|9 days ago

About how much had you previously donated over the years?

unknown|9 days ago

[deleted]

selridge|9 days ago

[deleted]

kmeisthax|9 days ago

[deleted]

paganel|9 days ago

At this point Archive.today provides a better service (all things considered) compared to Wikipedia, at least when it comes to current affairs.

368 comments