top | item 7240165

My website is being stolen in real time and I don't know what to do

90 points| joeyjones | 12 years ago | reply

I launched the site http://altexplorer.net at the start of January as a Block Explorer and information hub for alternative cryptographic currencies. This morning I found a site http://4co.in which is ripping-off my site in real-time; every time a page is loaded on 4co.in it uses php to load the corresponding page from http://altexplorer.net, removes analytics and ad tags, replaces the site name, and replaces the link URLs.

I've put a lot of effort into building this site and keeping it running, and now someone in India is stealing it in real-time. Every page load to 4coin causes an identical page load in the nginx logs of http://altexplorer.net. What can I do besides blocking the source IP address to stop this?

Screen shots: Alt Explorer home page: https://d1eem2029tdth0.cloudfront.net/img/altexplorer-home.png

4coin home page: https://d1eem2029tdth0.cloudfront.net/img/4coin-home.png

Alt Explorer profitability page: https://d1eem2029tdth0.cloudfront.net/img/altexplorer-prof.png

4coin profitability page: https://d1eem2029tdth0.cloudfront.net/img/4coin-prof.png

97 comments

order
[+] codegeek|12 years ago|reply
Lot of good suggestions already. I am not sure if you are interested in contacting the perpetrator directly and asking them to stop this but I did a little research for you.

looking up the whois info, it says that the registrant's email was [email protected]

When I put this email in google, I came across another spammy site called baklinks.blogspot.com. This site asks you to swap back links. At the bottom of the blog post, I found the name of the person "Naveen K R"

I then looked up google with "Naveen K R + bgrf". I was able to find a site he (probably) runs called www.zokali.com

More googling combos, I finally found his linkedin profile and his name "Naveen K Ramanand"

https://www.linkedin.com/in/krnaveen.

May be you can contact this guy directly. Seems like he is the one doing this or at least he knows who.

[+] chaz|12 years ago|reply
People usually use the same username, too, so using the LinkedIn username on Twitter: https://twitter.com/krnaveen. There's this tweet from November:

  I started to earn money on 4co.in short links. It’s easy -
  make the short links and earn the biggest money. http://4co.in
[+] IgorPartola|12 years ago|reply
If you end up trying to block his IP, don't just DROP or REJECT his packets. TARPIT [1] them! This way not only would you be denying him access, but you would also be draining his resources.

Another thing to try is to see just how much data his server will take. See if you can send him a GB-sized response.

[1] http://www.netfilter.org/projects/patch-o-matic/pom-external...

[+] danoprey|12 years ago|reply
Please try contacting them directly and simply asking to stop before doing this!
[+] msantos|12 years ago|reply
The javascript solution has already been suggested, but take a step back and think about it: the same way the leech worked out your links, domain name, logo and all the stuff that brands your website, he can easily figure out the simple JS code suggested here.

<img src="x" onerror= "if(document.location.href==='http://4co.in')document.location='//xxxxxx.xxxx';">

So I say, go a step further:

- do not send his users to a black hole, instead show a banner warning them about the leech and then after a few seconds redirect the user to your website.

- The JS code for the above should go in the same JS file that provides core functionality to your website. After done that, run your JS past http://closure-compiler.appspot.com/home or if you better still install the yuicompressor cli (http://yui.github.io/yuicompressor/) in your machine. The resulting code will be minified/compressed and seriously obfuscated. So trying to defeat it will that the leech hours if not days depending on his experience.

- encode/obfuscate the warning string (1st topic) to make it harder to find within the JS code.

- and finally do a daily spot check on website following @jarrett comment below

[+] pilom|12 years ago|reply
You found out the right first step yourself: Block the source IP address. Sure it will turn into a game of whack-a-mole with them changing their IP but eventually, their customers will get fed up with their downtime.

Second idea: Javascript redirect all of your pages to your own subdomain. Again, its just a step in an arms race, but this would be a little too hard/expensive to take to court. You can win an arms race if you try.

[+] jarrett|12 years ago|reply
If you have a hard time determining their IP, here's a trick that might work. Visit their site with a unique but innocuous-looking path or query that would never be accessed by a normal user. For example:

http://4co.in/?q=1

If the query string is being passed through, which I suspect it is, you can use the query string to easily locate the corresponding entry in your own logs. Or, if the query string isn't being passed through, you can use a path instead:

http://4co.in/q

You probably already thought of this technique. I decided to post it anyway in case you hadn't, or in case anyone else is facing a similar challenge.

[+] LanceH|12 years ago|reply
The more subtle response is to feed them bad data until they can't trust you.
[+] joeyjones|12 years ago|reply
For now I have added a news post with a link to the proper site and am debating between blocking the IP or delivering a static page with a link to the proper URL or a javascript redirect to the proper site.
[+] lmg643|12 years ago|reply
interesting that they didn't change the donation addresses. so if someone uses theirs, and likes it, sends some BTC to them, it will go to you?
[+] al2o3cr|12 years ago|reply
Detect their IP and 301 their requests to goatse. Or something worse, if you're bent like that. :)
[+] Navarr|12 years ago|reply
Why do that to people who probably don't know 4coin is being a thief?
[+] jaredsohn|12 years ago|reply
goatse as it was is no more. They had planned to offer vanity email addresses, but I am not sure if it took off. It looks like they're doing something with dogecoin now.

But an image search should help you find the image.

[+] danneu|12 years ago|reply
Don't punish users. The goal here shouldn't be to silently redirect or deceive them with fake data or throw up goatse.

Instead, make it annoyingly clear to anyone that visits 4co.in that the content is stolen. 4co.in users aren't visiting 4co.in to spite you. They just don't know and will gladly use your website instead.

The game of whack-a-mole is strongly in your favor because you're on the right side of a trapdoor.

[+] SEJeff|12 years ago|reply
Look for either the php user agent and/or the source ip. Why not use mod_redirect or something and redirect him to some bizarre internet meme site? I would suggest tub girl or goatse. It will get the point across very loud and clear. Or, just serve a different copy of your site to him that makes it loud and clear what he is doing is not ok. Either way, you can use mod_rewrite to cause this guy agony and prevent him from perpetrating this.
[+] kevinchen|12 years ago|reply
I noticed that OP put a link to the legitimate site. How about serve a version of the site that redirects to the corresponding page on your own?
[+] icedchai|12 years ago|reply
recommendation: respond with fake data, based on source IP. the problem will take care of itself.
[+] MarkPNeyer|12 years ago|reply
this is probably better than banning their source ip, as it will take longer to detect and piss off their customers.

also, report them to adsense and anyone else serving their ads.

[+] macNchz|12 years ago|reply
Gigabytes of fake data.

Let them eat /dev/urandom to their heart's content.

[+] joeyjones|12 years ago|reply
I am going to lok into this later today through nginx. I am planning on having every request from their scraping IP return a static page linking to the proper site.
[+] Faint|12 years ago|reply
Could we make him pay a few bucks?

Specifically, can we make him traffic multiply? I wonder what exactly is he doing with request headers... maybe this could work:

1) set up page /fluffy with wildly compressing contents, say 50MB of $£€$£€$£€$£€$£€.. always force gzip encoding 2) set up a few bots (amazon?) to download that page from his site, but do not accept any compression

Start the attack on some time the guy is probably sleeping, it might go on for a few hours before he notices, costing him a couple of hundred bucks in bandwidth.

Or maybe just some cpu waste in same vein: the guy has to open the gzip before forwarding to do string replace and re-zip it afterwards, so you can make sure that the content REALLY balloons..

[+] beauzero|12 years ago|reply
Instead of blocking source IP. Detect and send "unwanted information".
[+] BlakePetersen|12 years ago|reply
I agree, this may be a more effective approach than trying to block the IP and the whole whack-a-mole issue.

Essentially, they trust the data you're providing and are trying to make a buck off that info. But if they lose that trust because they don't know whether the data is legit or not, you win!

I would also try to mask the fact that the data is not accurate, if they immediately see everything as simply zeroed out, it would be a huge red flag you're on to them. If you provide them ALMOST correct data, it would be harder for them to determine what's going on and their users will see realize the disparity and (hopefully) get burned and never come back.

Essentially, the trick is to destroy the site's credibility so there's no financial benefit to continue to steal from you.

Good Luck!

[+] tedivm|12 years ago|reply
You can use javascript frame busting techniques to redirect back to the main page. You can also use mod_rewrite or some proxy setups to make it so a completely different set of pages shows up for people coming from that site. This is better than just blocking it because it's a bit more subtle and lets you tell that site's users what's happening.
[+] segmondy|12 years ago|reply
If you have time, go to war.

Have a page that spits the IP/hostname of referrer in a hidden section. Using that you can identify the IP/hostnames, so if he changes, you can always detect it.

Now that you can detect him, when he crawls your site, feed him garbage info for every single page, then constantly check his page for the hidden ip/hash in case he changes his IP/host. Hide that in a minified js. You can also feed his page bogus links that violates google's SEO so he can get blacklisted.

[+] joeyjones|12 years ago|reply
The thing is that he isn't scraping the site ahead of time, he scrapes for content in real time. When a request is made to 4co.in he requests the corresponding page from altexplorer.net, does string replacement on the site name and url, and then outputs it to his users.
[+] vdance|12 years ago|reply
First post here at HN... but I would try a shame tactic (per codegeek's helpful name research). In a nice bright box just above your normal content, send the following text back to his IP address ...

"Hello, my name is <insert his name here once you are certain> and I've stolen the content that you are viewing right now -- someone's hard work. I stole it in a very intentional and fairly disrespecful way. Sometimes we get life lessons and this may well be one of mine. Instead of using my skills to do good with the precious time that I have in this beautiful world, I've chosen to write a fairly nefarious script to copy every single page of someone else's website and suck it back into my website, so that I can profit from someone else's work. The message you are reading right now may go away for a day or two, if I change my IP address. But rest assured, it will be back once my IP address is rediscovered. This event will also follow me forever on search engines when people search my name -- future employers, friends, family. I have been doing this for <x> days and have been asked to stop. I haven't yet, but time will tell.... (<insert-pretty-date-here>)

In the meantime, if you would like to visit the real website go <here>..."

[+] matt_heimer|12 years ago|reply
The JavaScript frame busting methods are not the right approach, you have no control over what his users see. There is no reason he can't filter out any JavaScript or other HTML. In fact he might not even display your live HTML. He might have copied it to make his page templates and it scraping just the data from your site, you just don't know. If he isn't doing this now, he will if he gets in an arms race with you.

You need to return bad data to his site by IP address and possibly user-agent. Don't make the data bad to mess with the users, just make it return unusable data, for example all numbers are zeros. Then what you do it make a scheduled task that scraps his website (using his domain name). If you start getting HTTP requests in your logs that correspond to the schedule job you created then you add the new requesting IP to the blacklist of funny data, then make a second request to his website validate the IP you blacklisted. You could setup your scrapping tool to use random tor exit nodes and cycle the user-agent info.

He could do the same (random ips) but might not... Really you need some type of accountability which you can never have on a public website but requiring registration/authentication would help some if it becomes that important to you.

[+] joeyjones|12 years ago|reply
Sample log excerpt: 162.222.227.123 - - [14/Feb/2014:18:18:48 +0000] "GET / HTTP/1.1" 200 23271 "-" "-" "162.222.227.123"

162.222.227.123 - - [14/Feb/2014:18:37:51 +0000] "GET /chain/42 HTTP/1.1" 200 76170 "-" "-" "162.222.227.123"

162.222.227.123 - - [14/Feb/2014:17:40:58 +0000] "GET /block/0e67dcf5f6797840a98061af7581138f2347feb168d78f7138d4268c6f854748 HTTP/1.1" 200 15719 "-" "-" "162.222.227.123"

162.222.227.123 - - [14/Feb/2014:18:38:21 +0000] "GET /tx/6c636ebff9674f4168b80b415f8a9097509802992b0422a4fa98c543da9c068e HTTP/1.1" 200 15898 "-" "-" "162.222.227.123"

162.222.227.123 - - [14/Feb/2014:17:41:05 +0000] "GET /address/GRjc357hnC7THEUPVJmpMmCjSAGn54CJnx HTTP/1.1" 200 14034 "-" "-" "162.222.227.123"

162.222.227.123 - - [14/Feb/2014:18:13:21 +0000] "GET /news HTTP/1.1" 200 16675 "-" "-" "162.222.227.123"

162.222.227.123 - - [14/Feb/2014:18:19:12 +0000] "GET /profitability HTTP/1.1" 200 188354 "-" "-" "162.222.227.123"

[+] whitehat2k9|12 years ago|reply
Since the faker's requests don't have a User-Agent, you could block all requests lacking a valid User-Agent HTTP header.
[+] lotsofmangos|12 years ago|reply
Use imagemagick to watermark all image requests on the fly so you can keep changing the position of a url watermark on all images.

edit - actually, don't do this as it is trivially easy to get around by doing 2 or 3 requests and keeping anything that hasn't changed.

Or if you do do this, add a low level noise filter on top so that the attacker can't just directly equate pixel values.