Several people mention Ghostery[0] against trackers. It offers only partial protection. It is possible to fingerprint a browser without any custom tracking data.
Instead of a script to embed, these firms could provide an API to identify users from the server side. The scripts that captures the profile would be served by the sites themselves rather than from third party services.
Toast.
A possible solution would be anonymize the browser fingerprint, at least in private mode, ie lie about the details of the system.
Rather than trying to hide everything another tactic is to provide random misinformation (different user-agent strings, only presenting a subset of fonts and plugins, etc). Enough to defeat the fuzzy matching that does go on.
Sure you've got to be careful that you don't do things that may break some sites that rely on this information remaining stable during a session, but that's got far less common with the frequent browser upgrades that go on nowadays.
if the data transfer is done from the client (and i bet it is, as it's much harder to persuade people to run code on their servers) then ghostery and the like still work, because they block the transfer (since the code to do the transfer must be loaded from the weasel site - same origin policy).
Instead of a script to embed, these firms could provide an API to identify users from the server side. The scripts that captures the profile would be served by the sites themselves rather than from third party services. ... Toast.
Not really. If they share data on the server side, they wouldn't be able to share a cookie - they would have to rely on other means to identify you, such as IP address etc. Not entirely impossible, but not as precise either. And that is spoofable through proxies etc.
I'm quite surprised that panopticlick says I'm uniquely identifiable with Chrome based solely on the browser plugins reported, even though the ones I have are quite pedestrian: just Chrome PDF viewer, QuickTime, PepperFlash, and Flash.
In Firefox, the plugins get them to 1 in 860,000 which leaves only 3 possibilities in their DB of 2.5 mln, even though Firefox loads only QuickTime and Flash.
It must be the combination of codecs I have installed. How do I go about cleaning that up?
A good workaround for panopticlick would be to append a random string to the useragent for example, effectively making your fingerprint unique all the time.
I'm skeptical of this unnamed company's actual abilities. In the initial email how are they able to identify anything about your visitors before you've installed the tracking code? Since they apparently can see search terms used to reach your site the only thing I can think of is their code is running on some site that links to you (perhaps an off-brand search engine?) and they're tracking outbound clicks. Or it's fake.
It's pretty easy to guess company name from IP address, especially if you don't care about accuracy. You can kinda sorta do this in Google Analytics under Audience > Technology > Network. That seems to be roughly what they're doing in the screenshots posted. IMHO, this is not the most serious privacy issue on the web.
I would be very curious to hear exactly what percentage of visitors it is able to supply Name and Email for (and how many of those fields look bogus). This sort of individual-level tracking across sites is obviously possible, but I don't think it's common. Google/DoubleClick do not, as far as I know, do any sort of tracking at the level of an individual's name or email address (And why would they? It's asking for regulatory problems and it doesn't really help them much -- they target ads to groups of similar people based on demographics, not to particular named individuals.)
For users without showdead, the user darrennix (who appears to be the same Darren Nix who wrote the article) posted this comment. Why the mods or system would kill it I have no idea.
> It's a fair question and one that I asked myself. If the entire service is a fake, then it is an extremely elaborate one because the name and emails of the individuals it did indentify (which I noted was a small percentage) were real.
I imagine (though have no actual clue) that it's more of an e-mail sharing network between sites. You sign up for site A, the API tracks that and allows site B to see the signup details you entered.
One one level, I can see why sites do it. On another, one inch higher level, I can see how any site implementing it is so shortsighted that I'm amazed they didn't immediately go bankrupt as soon as they started.
They can identify you by name/email if you've entered it on a site in their "network". Their network may not be huge, but a (presumably) similar service had a big enough network to capture Sumit Suman's email earlier this week (https://plus.google.com/u/1/106142598193409336347/posts/2jLJ...)
HubSpot (and pretty much any other marketing automation tool) has this feature, too. They lookup company name and location by IP address and build an anonymous "prospect" record representing each visitor so that salespeople and marketers can detect whether prospects from a given company are hitting the site for information.
The second a prospect submits a web form, all that previous web activity is tied to their email address (and any other info you collected via the form). You now have a real lead.
I don't see any privacy issues with this.
What I would see an issue with is if the tracking company were sending the IP address and cookie back to a central database to query "Does anyone _else_ know who this visitor is?" and then provide PII any company who uses the tracking service.
The moment you start giving my PII to a company that I didn't voluntarily give it to is when I feel a line has been crossed.
> What I would see an issue with is if the tracking company were sending the IP address and cookie back to a central database to query "Does anyone _else_ know who this visitor is?" and then provide PII any company who uses the tracking service.
That appears to be exactly what's happening. The email mentions "access to our entire network of identified data ([...] we can identify any visitor [...] if that person has filled out a web form from any other website we are tracking)".
According to the sales rep, their tracking capability goes far beyond ip lookup. It explicitly involves saving form data from site A and sharing that personal information with site B.
This isn't what Hubspot does at all. While you are correct in that they, along with Marketo, Eloqua, Pardot, etc all look up company/location via IP, none of these companies are getting information from another website to identify prospects.
In the case of marketing automation, all the data lives within the system and is used by the company - rather than giving that information out - a very different proposition.
Well, isn't the scenario you would take issue with exactly what is happening here? From the article: "For example, if [a visitor] went to XYZ.com and filled out a web form and then [the visitor] later visited 42floors.com, [42Floors] would be able to identify [the visitor] by name/email as well as company details even though [the visitor] never filled out a web form on [42Floors.com]."
This sounds eerily familiar. Around a decade ago, a data analytics company called Pharmatrak was actually found guilty of breaking federal wiretapping statutes for doing something very similar. [1] In their case, they had built a network tracking HTTP GET requests to pharmaceuticals companies websites with a web bug [2] and attached cookie. But because some of the pharmaceuticals companies were using GETs as the method on HTML forms (remember, this was ten years ago), the users actually ended up making GET requests with personally identifying information in the URL encoded parameters. Since these GET requests were logged by Pharmatrak, and neither party (the users nor the pharmaceuticals companies) had consented to giving away personal information to them, Pharmatrak was found guilty of wiretapping.
Pharmatrak eventually won on appeal though, arguing that they had no intention of collecting personal information, which exonerated them because only intentional eavesdropping is a crime.
The company in the OP's article could make no such arguments though. I suspect that their main difference is that they make no assurances of confidentiality to the websites using their software the way Pharmatrak did. Which 1) is just really creepy, and 2) sets them up for trouble with users in California, because California's wiretapping statutes say that it's a crime unless both parties agree to it. [3]
sencha.com, activestate.com, sandisk.com, clustrix.com, and about 2000 others use LandLander. I checked the privacy policies of those four sites and none of them say they are giving away your personal information. On the contrary, they all explicitly say they aren't.
"We do not share any information about you or your company to unaffiliated third parties, except as necessary to administer the communications we offer and as permitted by law. We may use a third party service provider to for communications; that company is prohibited from using our users’ personally identifiable information for any other purpose. If you follow us on Twitter, Facebook or on other social media services, we may use information provided by these services to customize our communications to you. We will not share the personally identifiable information you provide with other third parties unless we give you prior notice and choice." - http://www.sencha.com/legal/privacy/
Nearly every company using LeadLander is breaking the law because their posted privacy policies do not state that they are giving a third party your personal information, and that third party is giving it to others.
Edit: It looks like http://formalyzer.com/formalyze_call.js is the specific js file that uploads personal information. Of the sites I listed only clustrix.com is loading that (on the contact form). The other sites seem to be using LeadLander without the form tracking.
Marketo has done company-level tracking for years[0], and if you click through from an email or fill out a form they can keep tracking you as well as back-fill any previously anonymous visits you made (depending on your browser cookie settings, of course). Once it's in the system, they partner with a number of companies, some of whom can help populate contact data[1], eg: "over 1.5 billion opt-in email addresses" -- how plausible is that? They have as customers a few companies[2] you're likely familiar with (eg: VMware).
Is the weasel company's javascript (and/or flash bug) logging all form input back to its own servers to capture name/email when you sign up somewhere else? Are they capturing credit card numbers too?
We can tell the world all day long this is Bad and Unsafe, but within six months it'll be more popular than ad retargeting and the meebo crapbar (because, hey, analytics!).
Can someone provide a regex that would identify this tracker? I'd like to run it through our index and see if I can come up with a list of sites that employ it.
I get emailed by them for every startup I'm involved with and that first email is mostly the same every time as you can see in that screenshot. (Compare it with the one posted in the article and you'll see).
They seem to be targeting startups and make it look like some big VC firms are visiting your site to get you interested. I'm not sure how they come up with the 'search terms', but I guess they could just look at your META-tags or make them up.
In their email they do say it's a "mock example", but still I find it very deceptive.
Dataium does this too, as covered by WSJ's recent article on the subject [1]
The article goes into depth about how much personal information is sent along to advertisers including a popular dating site's apparently anonymized information about drug use, and sexual orientation.
I think we need a non-profit service that defines a set of privacy licenses (akin to CreativeCommons' licenses) which companies can opt to label their websites/apps with. There would be no policing/auditing [2], but companies found to violate the privacy licenses would be obliged to donate a sum to an organization like the EFF.
That the privacy policies would be encompassed by one simple privacy licence badge would allow users to quickly and easily identify a company's privacy policies. I believe users would gravitate toward using services that display this license.
Going to site A, not providing any info, then going to site B, C and D and seeing ads to site A haunting you is one thing, capturing your name and email is a new level. If you don't use a tracking blocker, clearing cookies is not always going to work, these persistent trackers are quite sophisticated, they use local storage if possible, IP address, header information and whatever is possible to be able to identify someone, there is a huge industry behind it. But this one is taking it a little bit too far, scary.
On the other side, most startups including YC ones, use some sort of tracking for analytics to improve usability and internal flow, so advocating against all trackers and for all users installing a blocker is a double edge sword.
This is why I've deleted my facebook account and browse with Noscript disabling javascript (except for whitelist), RequestPolicy blocking cross-site requests (except for whitelist), and CookieMonster blocking cookies (except for whitelist).
It wouldn't completely work here (e.g. EFF's panopticlick could still fairly uniquely identify me, or IP address would give away info if I'm not going through my VPN), but it improves things.
It feels kind of extreme, but it's worth it to me. My experience is not broken that much, and I feel like various sites are aggregating less about me. These tracking technologies not such an issue now, but I foresee at least the possibility of abuse in the future, so I figure I'll do what I can now if it's not too much hassle.
Lastly, at its heart most of this is about advertising, something I know I'm very susceptible to (try as I might to convince myself I'm not). So the better I am at blocking out these things, I think the less money I'll spend in the long run on frivolous nice-to-haves.
This kind of thing is what I've always seen as the potential end result of things like google analytics and also facebook connect. Both products that have javascript running on a vast number of websites, with the potential to link to personally identifiable information, in a similar manner to that discussed in article.
I can't imagine that I'm alone in this train of thought.
I had to give Dick Smith (A NZ retailer) my phone number before I bought an external the other day.
"Do I _have_ to give you my number before I buy this?"
"yes, but it's for return purposes only"
Of course I received 'promotional' txts the next week. I was hesitant to give it to them for just this reason, and because I acknowledged I had a phone number I felt obligated to give it to him. Dick Smith is a member of a larger chain it's no stretch of the imagination to hook up CCTV cameras to an OpenCV instance and send txts to customers when they walk in.
No matter the law, morals people hold, or customer wants large companies are always motivated by profit margins. The Consumer Guarantees Act, the Privacy Act, the Bill of Rights Act all become murky when you're dealing with new technology, and law will find it hard to keep up.
I recognize these screenshots - it's definitely Leadlander.
I'm not sure if they do what he claims they do, but they can identify by your IP which company you belong to (assuming you're connecting from the office). There are a lot of companies doing that right now actually.
Read the article, thought that it was something interesting but probably not that applicable to me because I clear cookies on (frequent) browser close, don't enter my details into many sketchy sites, use multiple different (isolated) instances of my browser for different purposes.
Today, I get an email from a site that I visited yesterday and haven't heard from in 6+ months. It's too much of a coincidence for me to assume it's random so I dig into their website a little and they're using one of these services.
TL;DR: even though I'm relatively paranoid with giving out details online, one of these networks seems to have successfully identified me and provided my email to a website that I visited, who then reached out and tried to sell me shit.
[+] [-] pygy_|13 years ago|reply
https://panopticlick.eff.org/ <-- check how unique your browser is.
Instead of a script to embed, these firms could provide an API to identify users from the server side. The scripts that captures the profile would be served by the sites themselves rather than from third party services.
Toast.
A possible solution would be anonymize the browser fingerprint, at least in private mode, ie lie about the details of the system.
Google, Mozilla, Opera, can you hear me?
--
[0] http://www.ghostery.com/
[+] [-] dsr_|13 years ago|reply
Maybe it's forgotten. Maybe it lies. Maybe every time I rev Firefox Nightly I change identity.
What is true is that every time I leave an email address, it's tagged with the name of the site where I left it.
[+] [-] 001sky|13 years ago|reply
Google, Mozilla, Opera, can you hear me?
== This. The system needs to be fixed. Need to know (only) vs nice to know info exch, etc.
[+] [-] scrrr|13 years ago|reply
[+] [-] alexkus|13 years ago|reply
Rather than trying to hide everything another tactic is to provide random misinformation (different user-agent strings, only presenting a subset of fonts and plugins, etc). Enough to defeat the fuzzy matching that does go on.
Sure you've got to be careful that you don't do things that may break some sites that rely on this information remaining stable during a session, but that's got far less common with the frequent browser upgrades that go on nowadays.
[+] [-] andrewcooke|13 years ago|reply
[+] [-] troels|13 years ago|reply
Not really. If they share data on the server side, they wouldn't be able to share a cookie - they would have to rely on other means to identify you, such as IP address etc. Not entirely impossible, but not as precise either. And that is spoofable through proxies etc.
[+] [-] thinkling|13 years ago|reply
In Firefox, the plugins get them to 1 in 860,000 which leaves only 3 possibilities in their DB of 2.5 mln, even though Firefox loads only QuickTime and Flash.
It must be the combination of codecs I have installed. How do I go about cleaning that up?
[+] [-] wladimir|13 years ago|reply
[+] [-] ck2|13 years ago|reply
[+] [-] jdangu|13 years ago|reply
[+] [-] Alaskan005|13 years ago|reply
Plus one for this. I wonder if a plugin alone could change enough info to fool the trackers?
[+] [-] eli|13 years ago|reply
It's pretty easy to guess company name from IP address, especially if you don't care about accuracy. You can kinda sorta do this in Google Analytics under Audience > Technology > Network. That seems to be roughly what they're doing in the screenshots posted. IMHO, this is not the most serious privacy issue on the web.
I would be very curious to hear exactly what percentage of visitors it is able to supply Name and Email for (and how many of those fields look bogus). This sort of individual-level tracking across sites is obviously possible, but I don't think it's common. Google/DoubleClick do not, as far as I know, do any sort of tracking at the level of an individual's name or email address (And why would they? It's asking for regulatory problems and it doesn't really help them much -- they target ads to groups of similar people based on demographics, not to particular named individuals.)
[+] [-] paulgb|13 years ago|reply
> It's a fair question and one that I asked myself. If the entire service is a fake, then it is an extremely elaborate one because the name and emails of the individuals it did indentify (which I noted was a small percentage) were real.
[+] [-] untog|13 years ago|reply
One one level, I can see why sites do it. On another, one inch higher level, I can see how any site implementing it is so shortsighted that I'm amazed they didn't immediately go bankrupt as soon as they started.
[+] [-] paulgb|13 years ago|reply
[+] [-] chewxy|13 years ago|reply
[+] [-] unknown|13 years ago|reply
[deleted]
[+] [-] iy56|13 years ago|reply
[+] [-] darrennix|13 years ago|reply
[deleted]
[+] [-] rsobers|13 years ago|reply
The second a prospect submits a web form, all that previous web activity is tied to their email address (and any other info you collected via the form). You now have a real lead.
I don't see any privacy issues with this.
What I would see an issue with is if the tracking company were sending the IP address and cookie back to a central database to query "Does anyone _else_ know who this visitor is?" and then provide PII any company who uses the tracking service.
The moment you start giving my PII to a company that I didn't voluntarily give it to is when I feel a line has been crossed.
[+] [-] paulgb|13 years ago|reply
That appears to be exactly what's happening. The email mentions "access to our entire network of identified data ([...] we can identify any visitor [...] if that person has filled out a web form from any other website we are tracking)".
[+] [-] darrennix|13 years ago|reply
[+] [-] inthewoods|13 years ago|reply
In the case of marketing automation, all the data lives within the system and is used by the company - rather than giving that information out - a very different proposition.
[+] [-] larskinn|13 years ago|reply
[+] [-] smalldaddy|13 years ago|reply
[+] [-] nostromo|13 years ago|reply
Surprisingly, AdBlockPlus doesn't seem to block it.
Edit: actually it's LeadLander.com as pointed out by NiekvdMaas here http://news.ycombinator.com/item?id=4891764
[+] [-] damian2000|13 years ago|reply
Hardware manufacturers & telecomms seem to feature heavily:
(a selection) ... Adobe, Dell, IBM, AMD, Box.net, Cisco, CSC, Comcast, Freescale, HP, Lenovo, Motorola, Novell, Qwest, Salesforce.com, Siemens, Symantec, Verisign, VMWare, Vodafone.
And there's several anti-virus/anti-malware companies listed there.
UPDATE: The LeadLander.com site also lists their customers - Microsoft, Motorola, Red Hat and Cisco, among others.
[+] [-] politician|13 years ago|reply
[+] [-] r4vik|13 years ago|reply
[+] [-] d503|13 years ago|reply
[+] [-] jfriedly|13 years ago|reply
Pharmatrak eventually won on appeal though, arguing that they had no intention of collecting personal information, which exonerated them because only intentional eavesdropping is a crime.
The company in the OP's article could make no such arguments though. I suspect that their main difference is that they make no assurances of confidentiality to the websites using their software the way Pharmatrak did. Which 1) is just really creepy, and 2) sets them up for trouble with users in California, because California's wiretapping statutes say that it's a crime unless both parties agree to it. [3]
[1] http://cyberlaw.stanford.edu/packets001737.shtml
[2] http://en.wikipedia.org/wiki/Web_bug
[3] I'm not sure if this applies to police, but it definitely does to private parties: http://www.citmedialaw.org/legal-guide/california-recording-...
Edit: Added third reference.
[+] [-] andrewljohnson|13 years ago|reply
[+] [-] antoncohen|13 years ago|reply
"We do not share any information about you or your company to unaffiliated third parties, except as necessary to administer the communications we offer and as permitted by law. We may use a third party service provider to for communications; that company is prohibited from using our users’ personally identifiable information for any other purpose. If you follow us on Twitter, Facebook or on other social media services, we may use information provided by these services to customize our communications to you. We will not share the personally identifiable information you provide with other third parties unless we give you prior notice and choice." - http://www.sencha.com/legal/privacy/
Nearly every company using LeadLander is breaking the law because their posted privacy policies do not state that they are giving a third party your personal information, and that third party is giving it to others.
Edit: It looks like http://formalyzer.com/formalyze_call.js is the specific js file that uploads personal information. Of the sites I listed only clustrix.com is loading that (on the contact form). The other sites seem to be using LeadLander without the form tracking.
[+] [-] biot|13 years ago|reply
[0] http://www.marketo.com/small-medium-business/inbound-marketi...
[1] http://launchpoint.marketo.com/strikeiron-inc/747-strikeiron...
[2] http://www.marketo.com/customers/
[+] [-] seiji|13 years ago|reply
We can tell the world all day long this is Bad and Unsafe, but within six months it'll be more popular than ad retargeting and the meebo crapbar (because, hey, analytics!).
[+] [-] ChuckMcM|13 years ago|reply
[+] [-] marc|13 years ago|reply
Proof: http://o7.no/Z0huP7
I get emailed by them for every startup I'm involved with and that first email is mostly the same every time as you can see in that screenshot. (Compare it with the one posted in the article and you'll see).
They seem to be targeting startups and make it look like some big VC firms are visiting your site to get you interested. I'm not sure how they come up with the 'search terms', but I guess they could just look at your META-tags or make them up.
In their email they do say it's a "mock example", but still I find it very deceptive.
[+] [-] dskhatri|13 years ago|reply
The article goes into depth about how much personal information is sent along to advertisers including a popular dating site's apparently anonymized information about drug use, and sexual orientation.
I think we need a non-profit service that defines a set of privacy licenses (akin to CreativeCommons' licenses) which companies can opt to label their websites/apps with. There would be no policing/auditing [2], but companies found to violate the privacy licenses would be obliged to donate a sum to an organization like the EFF.
That the privacy policies would be encompassed by one simple privacy licence badge would allow users to quickly and easily identify a company's privacy policies. I believe users would gravitate toward using services that display this license.
Edit: it appears such a service is in the works - http://privacycommons.org
[1] http://online.wsj.com/article/SB1000142412788732478440457814...
[2] The auditing process would likely become complex, costly and corruptible
[+] [-] z0mbak|13 years ago|reply
facts (detected by a ghostery at 42floors.com): ClickTale, Facebook Connect, Google +1, Google Analytics, MixPanel, Optimizely, Twitter Button
[+] [-] chewxy|13 years ago|reply
[+] [-] eranation|13 years ago|reply
On the other side, most startups including YC ones, use some sort of tracking for analytics to improve usability and internal flow, so advocating against all trackers and for all users installing a blocker is a double edge sword.
[+] [-] losvedir|13 years ago|reply
It wouldn't completely work here (e.g. EFF's panopticlick could still fairly uniquely identify me, or IP address would give away info if I'm not going through my VPN), but it improves things.
It feels kind of extreme, but it's worth it to me. My experience is not broken that much, and I feel like various sites are aggregating less about me. These tracking technologies not such an issue now, but I foresee at least the possibility of abuse in the future, so I figure I'll do what I can now if it's not too much hassle.
Lastly, at its heart most of this is about advertising, something I know I'm very susceptible to (try as I might to convince myself I'm not). So the better I am at blocking out these things, I think the less money I'll spend in the long run on frivolous nice-to-haves.
[+] [-] Bockit|13 years ago|reply
I can't imagine that I'm alone in this train of thought.
[+] [-] pippy|13 years ago|reply
"Do I _have_ to give you my number before I buy this?"
"yes, but it's for return purposes only"
Of course I received 'promotional' txts the next week. I was hesitant to give it to them for just this reason, and because I acknowledged I had a phone number I felt obligated to give it to him. Dick Smith is a member of a larger chain it's no stretch of the imagination to hook up CCTV cameras to an OpenCV instance and send txts to customers when they walk in.
No matter the law, morals people hold, or customer wants large companies are always motivated by profit margins. The Consumer Guarantees Act, the Privacy Act, the Bill of Rights Act all become murky when you're dealing with new technology, and law will find it hard to keep up.
[+] [-] isalmon|13 years ago|reply
[+] [-] px1999|13 years ago|reply
Today, I get an email from a site that I visited yesterday and haven't heard from in 6+ months. It's too much of a coincidence for me to assume it's random so I dig into their website a little and they're using one of these services.
TL;DR: even though I'm relatively paranoid with giving out details online, one of these networks seems to have successfully identified me and provided my email to a website that I visited, who then reached out and tried to sell me shit.
[+] [-] jpxxx|13 years ago|reply
[+] [-] datamaze|13 years ago|reply
[+] [-] angryasian|13 years ago|reply