A website that deletes itself once indexed by Google

[+] tonyarkles|11 years ago|reply

I had a client once who had something similar, although unintentionally. She approached me because her website "kept getting hacked" and she didn't trust the original developers to solve the security problems... And rightly so!

There were two factors that, together, made this happen: first, the admin login form was implemented in JS, and if you went to log in with it with JS disabled, it wouldn't verify your credentials. And it submitted via a GET request. Second, once you were in the admin interface, you could delete content from the site by clicking on an X in the CMS. Which, as was the pattern, presented you with a JS alert() prompt before deleting the content... via a GET request.

Looking at the server logs around the time it got "hacked", you could see GoogleBot happily following all the delete links in the admin interface.

[+] ars|11 years ago|reply

> I had a client once who had something similar, although unintentionally.

I did that too. I was aware of the problem, but at the time (1996) I did not know how to fix it.

So I just documented it and warned that they should keep the site away from altavista.

This was back before cookies had wide support, so login state was in the URL. If you allowed a search spider to know that URL it would have deleted the entire site by spidering it.

I did eventually fix it by switching to forms, and strengthening the URL token to expire if unused for a while. And then eventually switching to cookies (at one point it supported both url tokens and cookies).

I have not thought about those days in such a long time.

[+] ikeboy|11 years ago|reply

Are you this guy http://thedailywtf.com/articles/The_Spider_of_Doom? Or http://craigandera.blogspot.com/2004/04/beware-googlebot_12....? (They seem like the same story but have different names.)

[+] kragen|11 years ago|reply

I accidentally deleted about half of the database at a startup where I’d recently started working by approximately the same method. I was running a copy of the web interface on my laptop, connecting over the internet to our MySQL server, and also running ht://dig’s spider on localhost from cron. It started spidering the delete links. Fortunately, I’d also started running daily MySQL backups from cron (there were no backups before I started working there), so we only lost a few hours of everyone’s work. As you can imagine, though, they weren’t super happy with me that day.

[+] toxicFork|11 years ago|reply

Someone should make a website for indexing bots to play with!

[+] tlrobinson|11 years ago|reply

I'm surprised there are so many people on Hacker News asking "why?".

Hackers don't need a reason, other than it being clever, novel, fun, etc. But if you want a reason there are plenty:

* art: there are numerous interpretations of this

* fun: this is sort of the digital equivalent of a "useless box" http://www.thinkgeek.com/product/ef0b/

* science: experiment to see how widespread a URL can be shared without Google becoming aware of it

* security: embed unique tokens in your content to detect if it has leaked to the public

[+] barbs|11 years ago|reply

I agree that there are lots of reasons that someone would make a site like this, but I think people are curious as to the maker's specific reason. From the github:

Why would you do such a thing? My full explanation was in the content of the site. (edit: ...which is now gone)

I'm curious as to what the website said originally.

[+] soheil|11 years ago|reply

I think more and more the word hacker has lost its original meaning at least in this community. If I were reading a similar story on a tor hidden service, let's say, I would not be asking why, but here I do.

[+] RexRollman|11 years ago|reply

My first thought was filesharing.

[+] raimondious|11 years ago|reply

My first question was "_why? Is that you?"

[+] dsjoerg|11 years ago|reply

It's a digital embodiment of coolness; once the masses can find out about it, it isn't cool anymore and the coolness is gone. Literally.

[+] LukeB_UK|11 years ago|reply

I think Hipsterism is what you're actually referring to.

[+] unknown|11 years ago|reply

[deleted]

[+] rjempson|11 years ago|reply

Much like tattoos.

[+] frik|11 years ago|reply

An alternative would be to check for the browser user agent and delete the website right at that point and return a 404 page to the Google crawler bot. Then Google won't have a static copy of the website.

[+] desdiv|11 years ago|reply

Your approach is "a website that irrevocably deletes itself once indexed by Google".

What OP has done is "a website that irrevocably deletes itself once Google decided to publicly reveal the fact that it indexed said website".

OP's approach has no way of knowing when the site was indexed. It's conceivable that Google indexed it on the very first day and decided not to share it publicly until 21 days later.

[+] LukeB_UK|11 years ago|reply

The problem with that is that you could spoof the user agent.

[+] ForHackernews|11 years ago|reply

They do tell google not to save a static copy:

> the NOARCHIVE meta tag is specified which prevents the Googles from caching their own copy of the content.

[+] whoopdedo|11 years ago|reply

What about the opposite? A website that created when it is indexed? Start with nothing and content is added each time the site is visited by Googlebot, or shared on Facebook, tweeted, posted on Reddit, etc. The website exists only so that it can be shared, and the act of sharing it defines what the website is.

[+] TeMPOraL|11 years ago|reply

This is an uber cool idea. Especially if, when this website is shared by someone, it would attempt to scan the sharer's public feed, last submissions, last comments, last tweets, etc. (depending on where it got shared), and generate additional content based on what it found.

Sounds like an awesome weekend project.

[+] yk|11 years ago|reply

Cool, but why? ( And shoulden't we invent digital Baroque art before inventing digital postmodernism?)

[+] cheatsheet|11 years ago|reply

Both exist.

Postmodernism is a lot more relevant to the digital age than anything, imo. It emphasizes pointing out ways of thinking and doing, which I think is especially relevant when we are actually automating most of our ways of thinking and doing.

I know it gets a bad rap because of the ridiculous examples, but the real point of it engages the viewer into a serious kind of contemplation concerning the massive infrastructure that exists and how that shapes our culture, thoughts, understanding, action..

We have the expectation that the generations to come will accept this infrastructure and what it says about how the human mind functions. But much of it is founded on belief systems of how thought and action operate in the real world. Most of these systems are baseless, the idea of a base obfuscated only by the sheer complexity involved in understanding each layer.

[+] egypturnash|11 years ago|reply

AAA games: where someone is paid to do nothing but design the details on imaginary Dwarven armor for an entire year.

If that ain't baroque I don't know what is.

[+] psykovsky|11 years ago|reply

Because.

[+] ssalazar|11 years ago|reply

Just check out any of the MIDI music forums for some sweet digital baroque art.

[+] byte1918|11 years ago|reply

Thank you.

http://i.imgur.com/cjDeLEb.png

EDIT: What's with the downvote hate? Somebody actually posted a valid key...

[+] PhasmaFelis|11 years ago|reply

As far as I can tell, you just posted part of a random screengrab from your web browser for no obvious reason. Striking's response suggests that this is actually a reference to a site which, per the OP, is gone forever, along with any chance of getting your joke. So...I'm not really sure what you were expecting.

[+] striking|11 years ago|reply

People likely didn't understand that someone posted a key for a game on that website and thought that you just posted an unrelated images.

[+] hackhat|11 years ago|reply

>Why would you do such a thing? My full explanation was in the content of the site. (edit: ...which is now gone)

So anyone really understood why he did this?

[+] TeMPOraL|11 years ago|reply

My guess - because he could, and likely had some good laugh when discussing it with friends.

[+] ikeboy|11 years ago|reply

Anyone know the origin or have an archive?

[+] TimWolla|11 years ago|reply

The origin is this: http://eep40h.herokuapp.com/

[+] imjustsaying|11 years ago|reply

I see what you did there, I think.

[+] WA|11 years ago|reply

Not sure if I see this as "art" or something. I mean, irrevocably deletes itself could be attached to a thousand arbitrary things.

- deleted after 100 visitors

- deleted if visited with IE 6.0 for the first time

- deleted if referrer is Facebook

- ...

[+] comboy|11 years ago|reply

Also, irrecovability seems a bit questionable (google cache, archive.org etc.)

[+] cubano|11 years ago|reply

Snapchat for websites...hmmmm perhaps.

[+] thewizardofmys|11 years ago|reply

I see some potential use of this, for example as soon as Google crawlers reach the site I know that it is accessible from outside and I destroy the site.

[+] placeybordeaux|11 years ago|reply

That seems to be the exact use case. Did you want to elaborate on why you find that useful?

[+] aqme28|11 years ago|reply

What is the purpose of a website that is inaccessible "from outside"?

[+] arash_milani|11 years ago|reply

"Death is reason for the beauty of butterfly"

[+] ars|11 years ago|reply

Who said that? I could not agree less. Butterflies are beautiful for their color, not their death.

[+] neilellis|11 years ago|reply

I have to say I'm not usually a fan of conceptual art, but kudos - the concept is great. Keep experimenting!

[+] scottcanoni|11 years ago|reply

I would be interested in similar experiments but with a couple of minor variations to see the effects of each:

1. Sending the NOINDEX meta tag

2. Combining meta tags

3. Monitoring for a referrer URL that matches a Google search page to catch the 1st non-sneaky user coming from the index.

4. Monitoring other search engines and their behaviors.

[+] angelortega|11 years ago|reply

grep Googlebot /var/www/log/* && rm -rf /var/www/site

[+] shubhamjain|11 years ago|reply

How about detecting GoogleBot traffic and deleting when it has crawled your website?

121 comments