top | item 9440965

Looking Up Symptoms Online? These Companies Are Tracking You

185 points| sinak | 11 years ago |motherboard.vice.com | reply

102 comments

order
[+] aw3c2|11 years ago|reply
Just a reminder that https://www.torproject.org offers a free and open-source unzip-and-run Firefox to use a anonymizing network run mostly by volunteers.

Using Tor to anonymously and privately educate yourself about embarrassing or potentially ostracized problems with yourself is a great use of it. Just remember that you should not ever enter any identifying information while using it.

Tor is more than fast enough for every day browsing, heck I use it to watch Youtube without major problems. I also use it to read the news, find recipes or lyrics (or similarly shady web circles) etc.

If the other side does not need to know who you are and does not have to synchronize that information into a vast tracking/advertising network, why should you willingly submit it?

[+] dzhiurgis|11 years ago|reply
Isn't it ironic that to check your symptoms you have you use the same technology that you buy your cocaine with?
[+] andy_ppp|11 years ago|reply
If I download Tor I'll end up on one of the governments' lists. Is there a way to download it anonymously :-)
[+] cabirum|11 years ago|reply
Why won't any browser's Incognito/InPrivate/Porn mode be enough? You need to prevent associating search queries with your logged in social accounts, Tor is kinda overkill for that.
[+] dredmorbius|11 years ago|reply
My biggest single problem using Tor is that far too many sites either block exit nodes outright or subject them to (frankly, understandable) increased levels of scruitany.

I use and run a Tor proxy (_not_ an exit node, mind), but notably Craigslist tends to block pretty much _all_ Tor traffic, and sites employing Cloudflare's DDoS protection present a Javascript-only CAPTCHA. Given that one of my primary Tor browsers is a console-only browser without JS support, that does little for me.

Google despite other problems (below) actually Does The Right Thing and presents an image which I can fetch and verify, though many of the graphics are exceptionally difficult to interpret.

I've documented my own other hassles accessing Google via Tor (G+, Gmail, etc.) in "How to kill your Google account: Access it via Tor":

https://www.reddit.com/r/dredmorbius/comments/2w618r/how_to_...

https://news.ycombinator.com/item?id=9060922

(The problems were compounded by Google's account recovery and verification procedures, though I ultimately did recover control thanks in no small part to intercession by Google's Yontan Zunger, for which I remain grateful).

Other options include the /etc/hosts file mentioned above (I've extended my own set with 62,000+ entries from a set of blockfiles used by the uMatrix Chrome extension). There's also Privoxy (though supporting _both_ Tor and non-Tor variants might be useful), and various browser extensions including Ghostery, Privacy Badger, AdBlock+, uMatrix, ScriptSave/NoScript, etc.

It's getting more than slightly tedious and is eroding trust in the Web generally.

The other area of significant interest is seeing work toward reputation systems which are compatible with Tor use. There are two I'm aware of, FAUST and "Fair Anonymity", though I've seen little discussion or adoption of these anywhere.

FAUST: https://gnunet.org/node/1704

"Fair Anonymity for the Tor Network" http://arxiv.org/pdf/1412.4707v1.pdf

Briefly discussed here:

https://www.reddit.com/r/dredmorbius/comments/30gszt/the_bac...

[+] kephra|11 years ago|reply
NoScript will prevent most of those evil trackings by default. e.g. cdc.gov displays fine without any JS, and google analytics or addthis on my untrusted list anyway.

Its still possible to browse without JS most of the time. Some pages are crippled by design, so disabling CSS might show the content. Others provide a escaped_fragment variant. But a stupid JS antipattern is sometimes used to display normal content with JS. One big problem are domains like ajax.google. This is often used to enhanced website, but google using it to track users.

When talking about evil Google, one needs to add YT. A friend of mine once claimed: You watch a stripper, if you visit YouPorn. But you strip your privacy, if you visit YouTube.

[+] moogly|11 years ago|reply
The CDN from which are served popular JavaScript libraries, ajax.googleapis.com, is not tracked. It's a cookie-less domain totally separate from google.com.
[+] dredmorbius|11 years ago|reply
Google's Blogger site is a tremendously flagrant example of this.
[+] notahacker|11 years ago|reply
The original source paper is at http://arxiv.org/pdf/1404.1951.pdf

Much as this sort of thing makes me glad I don't need to purchase private health insurance, the article would be a lot more helpful if it distinguished more clearly between what is and isn't legal use of the data as well as between the Experians and Google Analytics of this world.

That said, the original source paper probably if any thing plays down the potential concerns, contending, for example that a URI like http://www.ncbi.nlm.nih.gov/pubmed/21722252 contains no symptom-specific information when any sufficiently motivated actor can write a scraper that links anonymous looking URIs on healthcare domains to conditions and symptoms referenced in the page content.

[+] PhantomGremlin|11 years ago|reply
this sort of thing makes me glad I don't need to purchase private health insurance

Are you in the USA? Thanks to Obamacare your medical history doesn't matter anymore. I purchase my own insurance and only three things matter:

   your age

   your gender

   the type of coverage (bronze, silver, gold, etc)
It doesn't even matter whether you're single or married or have kids. My family policy cost is exactly the sum of:

   my policy cost based on age and gender

   my wife's policy cost based on age and gender

   each of my children's policy costs (I don't
   remember if age or gender matter, I don't think
   they do)
I generally don't like the idea of Obamacare, but in this case it did a lot of good. Before Obamacare, insurance companies went out of their way to simply not offer private coverage at all to people with any medical issues, even minor ones. They can't do that anymore.
[+] netcan|11 years ago|reply
I think at this stage we need to consider this a part of how the internet works.

I'm far from an expert, but I do think that the majority of legislative efforts as well as many initiatives from browser makers are approaching this wrong. Privacy, as much transparency as possible and optional setting for anything that comes with a trade-off need to be built into the browser, and not as a request sent to websites.

Transacting, being logged in, and certainly browsing are not inherently hindered by privacy. It's up to users (or their browser really) to demand it, in the economic sense of demand.

For now, there is no cost to this kind of tracking so it happens almost by default. Moral or even legislative pressure will not have the same effect as economic pressure. The decision to protect users privacy or not needs to come with costs.

[+] mike-cardwell|11 years ago|reply
When the major browser developers directly profit from being able to track users across the web, they're not going to make modifications to the way browsers work to prevent tracking. Not in any meaningful way.

It's a shame so many people use Chrome. They're effectively giving an Ad company which specialises in tracking people, power to control how the web develops.

[+] kefka|11 years ago|reply
And, I've found no solution regarding polluting your history with obfuscated searches.

If I, Mr. Spy Provider, start seeing a single user who has every possible documentable illness, that user's search has been polluted and is worthless.

So, how does one do this? Someone needs to write a search algo that pulls 100 crap medical searches for every good one. All you need to do is query the 1px image on the page. I'm guessing that could be done with 10KB/illness search for privacy pollution.

Should we have to? No. But this is the reality we live in. We can use the tools to keep us from being "found", but we still are querying the server the content is on. Nothing we can do about them selling that log. But we can pollute that log.

[+] dredmorbius|11 years ago|reply
Sadly I'm increasingly coming to the conclusion that fuzzing all search traffic in this manner is becoming a necessity. My concern is that it's still not sufficient. As Bruce Schneier notes, computers are exceptionally good at finding needles in haystacks, and even highly fuzzed data contains signal.

That said, there are browser extensions which run random/arbitrary background Web queries.

[+] DavideNL|11 years ago|reply
So i open the page and Disconnect shows 36 tracking items blocked, and uBlock shows 18 more items blocked.

Awesome :)

[+] belorn|11 years ago|reply
I use Tor browser quite often, and this is a primary reason why. I have several times thought "hmm, I should not be typing this into the search box", especially when at a work or at a public network.
[+] Maarten88|11 years ago|reply
> But the chief problem is simply that just about all of the above, under current laws, is legal

In the US maybe, but I would guess the business practices of most data brokers are already completely illegal in the EU. We have many laws and requirements for keeping and selling data on EU citizens. I would welcome stronger actions against these companies in the EU.

But somehow I fear that enforcing EU laws on US companies is not part of the TTIP trade agreement under negotiation between US and EU.

[+] Symbiote|11 years ago|reply
The EU does have the requirement for websites to say they're using third-party cookies (e.g. from Google Analytics). The weakness is the poor wording chosen. "By browsing this website, you agree that a list of pages you visit will be sent to Google and ComScore" would have had a much stronger message. Perhaps follow on with "The visit to this page has already been tracked. To remove this information from Google/ComScore's servers, click here".
[+] dzhiurgis|11 years ago|reply
Yeah I don't think these laws really work.

IIRC just few years ago when EU stared investigating Google and asked Google where does user data come in from, Google wasn't able to answer. I don't think they are able to track it anymore.

[+] BillFranklin|11 years ago|reply
I can't understand why any company that cares about privacy would use Google Analytics over Piwik.
[+] Silhouette|11 years ago|reply
I suspect that is an easy one. Your company might care about privacy, but not be in a technical industry or expert on these kinds of issues.

Google Analytics can be set up in a few minutes by anyone who could set up their own web site in the first place.

Setting up Piwik means understanding this: http://piwik.org/docs/installation-maintenance/

If you run web sites for a living, the latter is no big deal. If your company is a florist and you just learned a bit of basic HTML to write your blog about flower arranging, what's a MySQL?

[+] heliodor|11 years ago|reply
12 years of software engineering experience here.

I tried setting up Piwik for my website. Instead of showing N number of visitors, it consistently shows one or two. Tried googling, read the docs, nothing! Gave up. I have no idea why it fails so spectacularly.

Google Analytics will do just fine for now until better understanding my site's visitors becomes a more valuable proposition. Right now it's not worth it.

[+] seanp2k2|11 years ago|reply
EFF PrivacyBadger + uBlock with lots of lists enabled blocks most of the tracking garbage. Sad state of affairs that basically every website is doing some kind of for-profit selling of their users.
[+] perdunov|11 years ago|reply
Ghostery found one tracker (PiwikAnalytics) that PrivacyBadger didn't on the PrivacyBadger page itself.
[+] lifeisstillgood|11 years ago|reply
But, and I may be misunderstanding something, the page that I visit has the responsibility of serving these trackers? They call out to an adbroker, or analytics service, and they are responsible for the content surely? I mean if a newspaper prints a race hate ad for a neoNazi or FOX News runs porn adverts, they are the responsible party.

So it seems we could do with strong adblocking, but more useful (given spam email still exists) more useful will be actual enforced laws.

(I may be getting a bit old...)

[+] GhotiFish|11 years ago|reply
My government doesn't even begin to know how to deal with internet laws, the only thing I've seen come down the pipe are laws designed to protect square peg business practices in round hole environments.

The only thing that can be done is to make privacy and ad blocking tools universally deployed, and let the fallout happen.

[+] dm2|11 years ago|reply
Devil's Advocate: This data is important to public health. Search history for drugs is one of the best ways for companies, the public, and researchers to find out symptoms and the occurrence rate of symptoms. If that data is attached with location data then it gives them more pieces to the puzzle.
[+] brightsize|11 years ago|reply
My default search engine is ixquick.com . The service has a nifty proxy with a convenient proxy link next to each search result. The proxy breaks a lot of sites (JS blocking) but usually lets me see enough to determine if it's worth revisiting via Tor.
[+] hownottowrite|11 years ago|reply
Ghostery reports the following tracking beacons in the article itself.

===

Alexa Metrics

ChartBeat

Disqus

DoubleClick

eXelate

Facebook Connect

Google Adsense

Google AJAX Search API

Google Analytics

Google+ Platform

Krux Digital

Moat

NetRatings SiteCensus

Neustar AdAdvisor

PubMatic

Quantcast

Sailthru Horizon

ScoreCard Research Beacon

Twitter Button

[+] swombat|11 years ago|reply
Something about pots and kettles and the colour (or lack thereof) black...
[+] yuhong|11 years ago|reply
That is why I am thinking that private health insurance that covers doctor visits are probably flawed.
[+] logn|11 years ago|reply
... says the news site sending my data to The Nielsen Company.
[+] GoodIntentions|11 years ago|reply
I see vice asking to load shit from 12 other domains. Which one is neilson?