top | item 44561944

(no title)

leblancfg | 7 months ago

The amount and extent of data that is available out there by brokers for purchase by literally any company is *mind-boggling*. However bad you think it is, multiply that by 10.

discuss

order

jeffbee|7 months ago

I would say that in general the HN crowd doesn't understand the industry at all, and they need to change the direction of their understanding, rather than the magnitude. Your basic hackernews believes that e.g. Google is out there selling all your personal information. But compared to these other industries the tech industry is almost airtight. It has long been possible for someone to pick up the phone and order, in any format they want, transaction data as narrowly targeted as they wish. Credit card line items for 35-year-old dentists living on the 400 block of Elm street in local town? By end of day.

supriyo-biswas|7 months ago

This is correct; what people fundamentally misunderstand is that data brokers directly sell personal information about people, but Google and Facebook only allow for targeted advertising while keeping personal information within the confines of their company.

taeric|7 months ago

It has been truly frustrating when people will blame the "tech industry" for what is essentially reckless behavior from other industries. For a while, it was often the finance sector that did most of the crazy stuff. With crypto being an obnoxious overlap of the two.

everdrive|7 months ago

I'm also surprised that this is so hidden from everyone. Where are the engineers leaking secrets? Much of the online discourse is pure speculation based on what can be observed from the very end of the chain. (ie, what your computer is giving up) The speculation is not necessarily _incorrect_ but is too vague to be useful to anyone. Where does my data _actually_ go? Does anyone know? Can anyone describe the life of my data as it goes through the whole ecosystem? Does anyone know what mitigations are, and are not effective?

sofixa|7 months ago

> Your basic hackernews believes that e.g. Google is out there selling all your personal information

To add to this, any mention of "telemetry" is taken to mean your PII being taken by bad actors to abuse, instead of what it is in 99% of cases, which is usage statistics. (X% of our users use feature A, it merits investment). It can be both, but there's usually no place for differentiation, just pitchforks.

ck_one|7 months ago

Is that actually possible? Can we do a live test here?

Let's say we want this dataset: Credit card line items for 35-year-old dentists living on the 400 block of Elm street in local town

How much do I have to pay you to get it?

worik|7 months ago

> Credit card line items for 35-year-old dentists living on the 400 block of Elm street

I do not believe that. I would like evidence before I am convinced

If my bank is releasing that data I am horrified. I live in anew Zealand and our privacy laws are clear: it would be illegal

flossposse|7 months ago

I think the HN crowd is especially vocal about the tech industry in particular because that's the industry a lot of us have first-hand knowledge of - we know from personal observation that it is anything but airtight

criddell|7 months ago

> Your basic hackernews believes that e.g. Google is out there selling all your personal information.

I think most people here understand that Google sells ads against that data, but they aren't selling the data.

southernplaces7|7 months ago

Okay, and who are these people you contact for this data, and how do they themselves obtain it so precisely? You say the big tech industry is pretty air-tight about sharing data, so how does mysterious X company have on hand the credit ratings of all those youngish dentists on Elm street, among other kinds of information? How o these dynamics work, since you seem to know it internally?

Melatonic|7 months ago

Anyway to opt out of this type of data collection per company? I know for some things you can contact each individual broker and opt out (via some identifier like your email address) of your data being at least publicly available

trollied|7 months ago

A colleague created a banner ad that was an image that had the text “told you I could do this mate!” and targeted an individual to prove a point.

The general public have no idea how much ad providers and data brokers know about them.

rvnx|7 months ago

Seems just like retargeting in that case. Ask “victim” to visit page A. On that page A place a retargeting pixel, then now everywhere on the Internet you can display a message for that user as long as you are willing to pay a high price for that impression (high price is way way way less than 0.1 USD)

lyton|7 months ago

Reminds me of the time when Signal(the private messaging app) once tried to get ad data from Facebook and show it to users with a high degree of specificity eg “You got this ad because you’re a middle aged woman who enjoys kpop and loves reading about Christopher Nolan”

Relevant article: http://archive.today/fzUL4

rustcleaner|7 months ago

I need you to tell me how I do this right now. This will put so much cred into my spiels with people in meatspace. So many bricks will be shat!

JohnMakin|7 months ago

I work in this space - I'd say 1000x.

southernplaces7|7 months ago

I asked this same thing in another comment here, but since you mention working in this space, I ask you directly. Where do the brokers obtain their data from? If it's easy for them to obtain, would those who buy it from brokers not be able to simply get it from its respective sources? I'm genuinely curious about how this dynamic works.

OsrsNeedsf2P|7 months ago

Could you elaborate with specifics? If it's this bad, why haven't we heard anything from a whistleblower or seen a good demo?

Melatonic|7 months ago

Anyway to combat it or stop your info from being overly harvested?

blindriver|7 months ago

Around 2014 I worked with recruiters and they had a tool that aggregated data on everyone through LinkedIn, yelp, twitter, GitHub, eventbrite, etc. it was breathtaking the amount of information you could get on anyone, over 10+ years ago.

I’m guessing with the help of Palantir, the government has even more data and can probably link Reddit posts etc based on styleometry and can even perform psychological analysis on your personality and tendencies, etc.

worik|7 months ago

> it was breathtaking the amount of information you could get on anyone, over 10+ years ago.

After being burnt by things taken from my social media out of context, used to publicly shame me, I locked down my social media

Am I "sweetly naive" to think that had an effect? I do think it did

Before I stopped using Facebook I noticed, over the last decade, that almost every account I encountered was locked down similarly

My point is I suspect it is getting harder, not easier, for data thieves. The golden age of data theft has passed. Maybe.

rustcleaner|7 months ago

>styleometry

I really need to start using PocketPal (local LLM on Android) to restate my messages.

---

Oh, the places I'd like to send my texts so fine, With PocketPal, a tool that's truly divine, Local LLM on Android, a wondrous device to see, To help me restate my messages with glee! Wheee!

kevin_thibedeau|7 months ago

The government has been buying and funding R&D with data brokers since before Google existed.

southernplaces7|7 months ago

My question here is also how the brokers obtain the data themselves? Wouldn't it be simple for those who buy it from the brokers at a markup to just get it from its original sources themselves? Also, if the data is in any case available, the real at-fault culprits aren't so much the brokers as those who store and so easily sell it in the first instance.

roadside_picnic|7 months ago

> Wouldn't it be simple for those who buy it from the brokers at a markup to just get it from its original sources themselves?

In many cases joining datasets is both labor intensive and creates a surprising amount of new information, and there is also plenty of "free" data that is incredibly tedious to work with.

I used to work with real estate data for the government and if you search for any common things you might want to know you often land on a data brokers page even though property assessor data is freely available in most counties. The problem is each county has their own system of storing data and their own process for searching it. It's a lot of work to learn how just this one dataset works, combining this for all counties in the US is a massive project.

Whenever I buy a new home I always look up all my neighbors, figure out when they bought the house, how much they paid etc. Some people get freaked out by this, but this information is public in most counties.

By joining this data with another public data set, you can actually figure out which lender your neighbors used and what their reported income at time of sale, their age and ethnic background.

Of course there are plenty of other ways data brokers come across data, but even cleaning up and joining public data can require a fair bit of time and expertise.

victorbjorklund|7 months ago

Sellers of the data wanna deal with one or a few buyers that buy bulk. They dont wanna deal with thousands of customers.

greenie_beans|7 months ago

what are some good cheap sources to get this? i have an art project idea that i've wanted make that would require invasive data profiles, but it's very big project and i have no idea where to start

onlyrealcuzzo|7 months ago

Further, they are literally in the business of selling your data for a profit.

It should not be surprising that they are selling your data for a profit...