You’re not anonymous. I know your name, email, and company.

[+] pygy_|13 years ago|reply

Several people mention Ghostery[0] against trackers. It offers only partial protection. It is possible to fingerprint a browser without any custom tracking data.

https://panopticlick.eff.org/ <-- check how unique your browser is.

Instead of a script to embed, these firms could provide an API to identify users from the server side. The scripts that captures the profile would be served by the sites themselves rather than from third party services.

Toast.

A possible solution would be anonymize the browser fingerprint, at least in private mode, ie lie about the details of the system.

Google, Mozilla, Opera, can you hear me?

--

[0] http://www.ghostery.com/

[+] dsr_|13 years ago|reply

Not only does panopticlick say my browser is unique, but it said that last time I visited it, several months ago.

Maybe it's forgotten. Maybe it lies. Maybe every time I rev Firefox Nightly I change identity.

What is true is that every time I leave an email address, it's tagged with the name of the site where I left it.

[+] 001sky|13 years ago|reply

A possible solution would be anonymize the browser fingerprint, at least in private mode, ie lie about the details of the system.

Google, Mozilla, Opera, can you hear me?

== This. The system needs to be fixed. Need to know (only) vs nice to know info exch, etc.

[+] scrrr|13 years ago|reply

I just played around with a Chrome-extension to mess around with panopticlick. In case anyone wants to continue: http://lukaszielinski.de/blog/posts/2012/12/12/panopticlick/

[+] alexkus|13 years ago|reply

> lie about the details of the system

Rather than trying to hide everything another tactic is to provide random misinformation (different user-agent strings, only presenting a subset of fonts and plugins, etc). Enough to defeat the fuzzy matching that does go on.

Sure you've got to be careful that you don't do things that may break some sites that rely on this information remaining stable during a session, but that's got far less common with the frequent browser upgrades that go on nowadays.

[+] andrewcooke|13 years ago|reply

if the data transfer is done from the client (and i bet it is, as it's much harder to persuade people to run code on their servers) then ghostery and the like still work, because they block the transfer (since the code to do the transfer must be loaded from the weasel site - same origin policy).

[+] troels|13 years ago|reply

Instead of a script to embed, these firms could provide an API to identify users from the server side. The scripts that captures the profile would be served by the sites themselves rather than from third party services. ... Toast.

Not really. If they share data on the server side, they wouldn't be able to share a cookie - they would have to rely on other means to identify you, such as IP address etc. Not entirely impossible, but not as precise either. And that is spoofable through proxies etc.

[+] thinkling|13 years ago|reply

I'm quite surprised that panopticlick says I'm uniquely identifiable with Chrome based solely on the browser plugins reported, even though the ones I have are quite pedestrian: just Chrome PDF viewer, QuickTime, PepperFlash, and Flash.

In Firefox, the plugins get them to 1 in 860,000 which leaves only 3 possibilities in their DB of 2.5 mln, even though Firefox loads only QuickTime and Flash.

It must be the combination of codecs I have installed. How do I go about cleaning that up?

[+] wladimir|13 years ago|reply

I'm not sure about Ghostery, but the TOR Browser bundle (based on Firefox, see https://blog.torproject.org/blog/effs-panopticlick-and-torbu...) does apply a few tricks to normalize the browser fingerprint.

[+] ck2|13 years ago|reply

Also when you update Ghostery buglists, it whitelists some sites, you have to select all on purpose each time.

[+] jdangu|13 years ago|reply

A good workaround for panopticlick would be to append a random string to the useragent for example, effectively making your fingerprint unique all the time.

[+] Alaskan005|13 years ago|reply

A possible solution would be anonymize the browser fingerprint, at least in private mode, ie lie about the details of the system.

Plus one for this. I wonder if a plugin alone could change enough info to fool the trackers?

[+] eli|13 years ago|reply

I'm skeptical of this unnamed company's actual abilities. In the initial email how are they able to identify anything about your visitors before you've installed the tracking code? Since they apparently can see search terms used to reach your site the only thing I can think of is their code is running on some site that links to you (perhaps an off-brand search engine?) and they're tracking outbound clicks. Or it's fake.

It's pretty easy to guess company name from IP address, especially if you don't care about accuracy. You can kinda sorta do this in Google Analytics under Audience > Technology > Network. That seems to be roughly what they're doing in the screenshots posted. IMHO, this is not the most serious privacy issue on the web.

I would be very curious to hear exactly what percentage of visitors it is able to supply Name and Email for (and how many of those fields look bogus). This sort of individual-level tracking across sites is obviously possible, but I don't think it's common. Google/DoubleClick do not, as far as I know, do any sort of tracking at the level of an individual's name or email address (And why would they? It's asking for regulatory problems and it doesn't really help them much -- they target ads to groups of similar people based on demographics, not to particular named individuals.)

[+] paulgb|13 years ago|reply

For users without showdead, the user darrennix (who appears to be the same Darren Nix who wrote the article) posted this comment. Why the mods or system would kill it I have no idea.

> It's a fair question and one that I asked myself. If the entire service is a fake, then it is an extremely elaborate one because the name and emails of the individuals it did indentify (which I noted was a small percentage) were real.

[+] untog|13 years ago|reply

I imagine (though have no actual clue) that it's more of an e-mail sharing network between sites. You sign up for site A, the API tracks that and allows site B to see the signup details you entered.

One one level, I can see why sites do it. On another, one inch higher level, I can see how any site implementing it is so shortsighted that I'm amazed they didn't immediately go bankrupt as soon as they started.

[+] paulgb|13 years ago|reply

They can identify you by name/email if you've entered it on a site in their "network". Their network may not be huge, but a (presumably) similar service had a big enough network to capture Sumit Suman's email earlier this week (https://plus.google.com/u/1/106142598193409336347/posts/2jLJ...)

[+] chewxy|13 years ago|reply

Toolbars. People install toolbars all the time.

[+] unknown|13 years ago|reply

[deleted]

[+] iy56|13 years ago|reply

The initial email comes with what it calls a mock example. I think that's what you're referring to.

[+] darrennix|13 years ago|reply

[deleted]

[+] rsobers|13 years ago|reply

HubSpot (and pretty much any other marketing automation tool) has this feature, too. They lookup company name and location by IP address and build an anonymous "prospect" record representing each visitor so that salespeople and marketers can detect whether prospects from a given company are hitting the site for information.

The second a prospect submits a web form, all that previous web activity is tied to their email address (and any other info you collected via the form). You now have a real lead.

I don't see any privacy issues with this.

What I would see an issue with is if the tracking company were sending the IP address and cookie back to a central database to query "Does anyone _else_ know who this visitor is?" and then provide PII any company who uses the tracking service.

The moment you start giving my PII to a company that I didn't voluntarily give it to is when I feel a line has been crossed.

[+] paulgb|13 years ago|reply

> What I would see an issue with is if the tracking company were sending the IP address and cookie back to a central database to query "Does anyone _else_ know who this visitor is?" and then provide PII any company who uses the tracking service.

That appears to be exactly what's happening. The email mentions "access to our entire network of identified data ([...] we can identify any visitor [...] if that person has filled out a web form from any other website we are tracking)".

[+] darrennix|13 years ago|reply

According to the sales rep, their tracking capability goes far beyond ip lookup. It explicitly involves saving form data from site A and sharing that personal information with site B.

[+] inthewoods|13 years ago|reply

This isn't what Hubspot does at all. While you are correct in that they, along with Marketo, Eloqua, Pardot, etc all look up company/location via IP, none of these companies are getting information from another website to identify prospects.

In the case of marketing automation, all the data lives within the system and is used by the company - rather than giving that information out - a very different proposition.

[+] larskinn|13 years ago|reply

Well, isn't the scenario you would take issue with exactly what is happening here? From the article: "For example, if [a visitor] went to XYZ.com and filled out a web form and then [the visitor] later visited 42floors.com, [42Floors] would be able to identify [the visitor] by name/email as well as company details even though [the visitor] never filled out a web form on [42Floors.com]."

[+] smalldaddy|13 years ago|reply

Marketo has this too ... they are all over the place! I manage multiple "online" identities so I can track who is me directly (and who is implied me).

[+] nostromo|13 years ago|reply

Just looked through Zendesk's network calls -- looks like it's probably Demandbase. http://www.demandbase.com/landing-page/demandbase-real-time-...

Surprisingly, AdBlockPlus doesn't seem to block it.

Edit: actually it's LeadLander.com as pointed out by NiekvdMaas here http://news.ycombinator.com/item?id=4891764

[+] damian2000|13 years ago|reply

I'm surprised by their extensive list of customers, and also the fact that customers seem to be happy to be identified as their customers ... http://www.demandbase.com/who-uses-demandbase/customer-list/

Hardware manufacturers & telecomms seem to feature heavily:

(a selection) ... Adobe, Dell, IBM, AMD, Box.net, Cisco, CSC, Comcast, Freescale, HP, Lenovo, Motorola, Novell, Qwest, Salesforce.com, Siemens, Symantec, Verisign, VMWare, Vodafone.

And there's several anti-virus/anti-malware companies listed there.

UPDATE: The LeadLander.com site also lists their customers - Microsoft, Motorola, Red Hat and Cisco, among others.

[+] politician|13 years ago|reply

Just checked, looks like Ghostery blocks Demandbase.

[+] r4vik|13 years ago|reply

AdBlock is for blocking (annoying) ads, not trackers.

[+] d503|13 years ago|reply

Ghostery does, though.

[+] jfriedly|13 years ago|reply

This sounds eerily familiar. Around a decade ago, a data analytics company called Pharmatrak was actually found guilty of breaking federal wiretapping statutes for doing something very similar. [1] In their case, they had built a network tracking HTTP GET requests to pharmaceuticals companies websites with a web bug [2] and attached cookie. But because some of the pharmaceuticals companies were using GETs as the method on HTML forms (remember, this was ten years ago), the users actually ended up making GET requests with personally identifying information in the URL encoded parameters. Since these GET requests were logged by Pharmatrak, and neither party (the users nor the pharmaceuticals companies) had consented to giving away personal information to them, Pharmatrak was found guilty of wiretapping.

Pharmatrak eventually won on appeal though, arguing that they had no intention of collecting personal information, which exonerated them because only intentional eavesdropping is a crime.

The company in the OP's article could make no such arguments though. I suspect that their main difference is that they make no assurances of confidentiality to the websites using their software the way Pharmatrak did. Which 1) is just really creepy, and 2) sets them up for trouble with users in California, because California's wiretapping statutes say that it's a crime unless both parties agree to it. [3]

[1] http://cyberlaw.stanford.edu/packets001737.shtml

[2] http://en.wikipedia.org/wiki/Web_bug

[3] I'm not sure if this applies to police, but it definitely does to private parties: http://www.citmedialaw.org/legal-guide/california-recording-...

Edit: Added third reference.

[+] andrewljohnson|13 years ago|reply

If I found out a site I used employed this tool, I'd both trash them publicly and never use their service again.

[+] antoncohen|13 years ago|reply

sencha.com, activestate.com, sandisk.com, clustrix.com, and about 2000 others use LandLander. I checked the privacy policies of those four sites and none of them say they are giving away your personal information. On the contrary, they all explicitly say they aren't.

"We do not share any information about you or your company to unaffiliated third parties, except as necessary to administer the communications we offer and as permitted by law. We may use a third party service provider to for communications; that company is prohibited from using our users’ personally identifiable information for any other purpose. If you follow us on Twitter, Facebook or on other social media services, we may use information provided by these services to customize our communications to you. We will not share the personally identifiable information you provide with other third parties unless we give you prior notice and choice." - http://www.sencha.com/legal/privacy/

Nearly every company using LeadLander is breaking the law because their posted privacy policies do not state that they are giving a third party your personal information, and that third party is giving it to others.

Edit: It looks like http://formalyzer.com/formalyze_call.js is the specific js file that uploads personal information. Of the sites I listed only clustrix.com is loading that (on the contact form). The other sites seem to be using LeadLander without the form tracking.

[+] biot|13 years ago|reply

Marketo has done company-level tracking for years[0], and if you click through from an email or fill out a form they can keep tracking you as well as back-fill any previously anonymous visits you made (depending on your browser cookie settings, of course). Once it's in the system, they partner with a number of companies, some of whom can help populate contact data[1], eg: "over 1.5 billion opt-in email addresses" -- how plausible is that? They have as customers a few companies[2] you're likely familiar with (eg: VMware).

[0] http://www.marketo.com/small-medium-business/inbound-marketi...

[1] http://launchpoint.marketo.com/strikeiron-inc/747-strikeiron...

[2] http://www.marketo.com/customers/

[+] seiji|13 years ago|reply

Is the weasel company's javascript (and/or flash bug) logging all form input back to its own servers to capture name/email when you sign up somewhere else? Are they capturing credit card numbers too?

We can tell the world all day long this is Bad and Unsafe, but within six months it'll be more popular than ad retargeting and the meebo crapbar (because, hey, analytics!).

[+] ChuckMcM|13 years ago|reply

Can someone provide a regex that would identify this tracker? I'd like to run it through our index and see if I can come up with a list of sites that employ it.

[+] marc|13 years ago|reply

The initial data is fake.

Proof: http://o7.no/Z0huP7

I get emailed by them for every startup I'm involved with and that first email is mostly the same every time as you can see in that screenshot. (Compare it with the one posted in the article and you'll see).

They seem to be targeting startups and make it look like some big VC firms are visiting your site to get you interested. I'm not sure how they come up with the 'search terms', but I guess they could just look at your META-tags or make them up.

In their email they do say it's a "mock example", but still I find it very deceptive.

[+] dskhatri|13 years ago|reply

Dataium does this too, as covered by WSJ's recent article on the subject [1]

The article goes into depth about how much personal information is sent along to advertisers including a popular dating site's apparently anonymized information about drug use, and sexual orientation.

I think we need a non-profit service that defines a set of privacy licenses (akin to CreativeCommons' licenses) which companies can opt to label their websites/apps with. There would be no policing/auditing [2], but companies found to violate the privacy licenses would be obliged to donate a sum to an organization like the EFF.

That the privacy policies would be encompassed by one simple privacy licence badge would allow users to quickly and easily identify a company's privacy policies. I believe users would gravitate toward using services that display this license.

Edit: it appears such a service is in the works - http://privacycommons.org

[1] http://online.wsj.com/article/SB1000142412788732478440457814...

[2] The auditing process would likely become complex, costly and corruptible

[+] z0mbak|13 years ago|reply

quote: At 42Floors, we’ve made the decision not to use any visitor identification tools...

facts (detected by a ghostery at 42floors.com): ClickTale, Facebook Connect, Google +1, Google Analytics, MixPanel, Optimizely, Twitter Button

[+] chewxy|13 years ago|reply

None of them can be used to single out and identify individual users (except GA... which I believe can be done if you are clever with it)

[+] eranation|13 years ago|reply

Going to site A, not providing any info, then going to site B, C and D and seeing ads to site A haunting you is one thing, capturing your name and email is a new level. If you don't use a tracking blocker, clearing cookies is not always going to work, these persistent trackers are quite sophisticated, they use local storage if possible, IP address, header information and whatever is possible to be able to identify someone, there is a huge industry behind it. But this one is taking it a little bit too far, scary.

On the other side, most startups including YC ones, use some sort of tracking for analytics to improve usability and internal flow, so advocating against all trackers and for all users installing a blocker is a double edge sword.

[+] losvedir|13 years ago|reply

This is why I've deleted my facebook account and browse with Noscript disabling javascript (except for whitelist), RequestPolicy blocking cross-site requests (except for whitelist), and CookieMonster blocking cookies (except for whitelist).

It wouldn't completely work here (e.g. EFF's panopticlick could still fairly uniquely identify me, or IP address would give away info if I'm not going through my VPN), but it improves things.

It feels kind of extreme, but it's worth it to me. My experience is not broken that much, and I feel like various sites are aggregating less about me. These tracking technologies not such an issue now, but I foresee at least the possibility of abuse in the future, so I figure I'll do what I can now if it's not too much hassle.

Lastly, at its heart most of this is about advertising, something I know I'm very susceptible to (try as I might to convince myself I'm not). So the better I am at blocking out these things, I think the less money I'll spend in the long run on frivolous nice-to-haves.

[+] Bockit|13 years ago|reply

This kind of thing is what I've always seen as the potential end result of things like google analytics and also facebook connect. Both products that have javascript running on a vast number of websites, with the potential to link to personally identifiable information, in a similar manner to that discussed in article.

I can't imagine that I'm alone in this train of thought.

[+] pippy|13 years ago|reply

I had to give Dick Smith (A NZ retailer) my phone number before I bought an external the other day.

"Do I _have_ to give you my number before I buy this?"

"yes, but it's for return purposes only"

Of course I received 'promotional' txts the next week. I was hesitant to give it to them for just this reason, and because I acknowledged I had a phone number I felt obligated to give it to him. Dick Smith is a member of a larger chain it's no stretch of the imagination to hook up CCTV cameras to an OpenCV instance and send txts to customers when they walk in.

No matter the law, morals people hold, or customer wants large companies are always motivated by profit margins. The Consumer Guarantees Act, the Privacy Act, the Bill of Rights Act all become murky when you're dealing with new technology, and law will find it hard to keep up.

[+] isalmon|13 years ago|reply

I recognize these screenshots - it's definitely Leadlander. I'm not sure if they do what he claims they do, but they can identify by your IP which company you belong to (assuming you're connecting from the office). There are a lot of companies doing that right now actually.

[+] px1999|13 years ago|reply

Read the article, thought that it was something interesting but probably not that applicable to me because I clear cookies on (frequent) browser close, don't enter my details into many sketchy sites, use multiple different (isolated) instances of my browser for different purposes.

Today, I get an email from a site that I visited yesterday and haven't heard from in 6+ months. It's too much of a coincidence for me to assume it's random so I dig into their website a little and they're using one of these services.

TL;DR: even though I'm relatively paranoid with giving out details online, one of these networks seems to have successfully identified me and provided my email to a website that I visited, who then reached out and tried to sell me shit.

231 comments