The amount and extent of data that is available out there by brokers for purchase by literally any company is *mind-boggling*. However bad you think it is, multiply that by 10.
I would say that in general the HN crowd doesn't understand the industry at all, and they need to change the direction of their understanding, rather than the magnitude. Your basic hackernews believes that e.g. Google is out there selling all your personal information. But compared to these other industries the tech industry is almost airtight. It has long been possible for someone to pick up the phone and order, in any format they want, transaction data as narrowly targeted as they wish. Credit card line items for 35-year-old dentists living on the 400 block of Elm street in local town? By end of day.
This is correct; what people fundamentally misunderstand is that data brokers directly sell personal information about people, but Google and Facebook only allow for targeted advertising while keeping personal information within the confines of their company.
It has been truly frustrating when people will blame the "tech industry" for what is essentially reckless behavior from other industries. For a while, it was often the finance sector that did most of the crazy stuff. With crypto being an obnoxious overlap of the two.
I'm also surprised that this is so hidden from everyone. Where are the engineers leaking secrets? Much of the online discourse is pure speculation based on what can be observed from the very end of the chain. (ie, what your computer is giving up) The speculation is not necessarily _incorrect_ but is too vague to be useful to anyone. Where does my data _actually_ go? Does anyone know? Can anyone describe the life of my data as it goes through the whole ecosystem? Does anyone know what mitigations are, and are not effective?
> Your basic hackernews believes that e.g. Google is out there selling all your personal information
To add to this, any mention of "telemetry" is taken to mean your PII being taken by bad actors to abuse, instead of what it is in 99% of cases, which is usage statistics. (X% of our users use feature A, it merits investment). It can be both, but there's usually no place for differentiation, just pitchforks.
I think the HN crowd is especially vocal about the tech industry in particular because that's the industry a lot of us have first-hand knowledge of - we know from personal observation that it is anything but airtight
Okay, and who are these people you contact for this data, and how do they themselves obtain it so precisely? You say the big tech industry is pretty air-tight about sharing data, so how does mysterious X company have on hand the credit ratings of all those youngish dentists on Elm street, among other kinds of information? How o these dynamics work, since you seem to know it internally?
Anyway to opt out of this type of data collection per company? I know for some things you can contact each individual broker and opt out (via some identifier like your email address) of your data being at least publicly available
Seems just like retargeting in that case. Ask “victim” to visit page A. On that page A place a retargeting pixel, then now everywhere on the Internet you can display a message for that user as long as you are willing to pay a high price for that impression (high price is way way way less than 0.1 USD)
Reminds me of the time when Signal(the private messaging app) once tried to get ad data from Facebook and show it to users with a high degree of specificity eg “You got this ad because you’re a middle aged woman who enjoys kpop and loves reading about Christopher Nolan”
I asked this same thing in another comment here, but since you mention working in this space, I ask you directly. Where do the brokers obtain their data from? If it's easy for them to obtain, would those who buy it from brokers not be able to simply get it from its respective sources? I'm genuinely curious about how this dynamic works.
Around 2014 I worked with recruiters and they had a tool that aggregated data on everyone through LinkedIn, yelp, twitter, GitHub, eventbrite, etc. it was breathtaking the amount of information you could get on anyone, over 10+ years ago.
I’m guessing with the help of Palantir, the government has even more data and can probably link Reddit posts etc based on styleometry and can even perform psychological analysis on your personality and tendencies, etc.
I really need to start using PocketPal (local LLM on Android) to restate my messages.
---
Oh, the places I'd like to send my texts so fine,
With PocketPal, a tool that's truly divine,
Local LLM on Android, a wondrous device to see,
To help me restate my messages with glee! Wheee!
My question here is also how the brokers obtain the data themselves? Wouldn't it be simple for those who buy it from the brokers at a markup to just get it from its original sources themselves? Also, if the data is in any case available, the real at-fault culprits aren't so much the brokers as those who store and so easily sell it in the first instance.
> Wouldn't it be simple for those who buy it from the brokers at a markup to just get it from its original sources themselves?
In many cases joining datasets is both labor intensive and creates a surprising amount of new information, and there is also plenty of "free" data that is incredibly tedious to work with.
I used to work with real estate data for the government and if you search for any common things you might want to know you often land on a data brokers page even though property assessor data is freely available in most counties. The problem is each county has their own system of storing data and their own process for searching it. It's a lot of work to learn how just this one dataset works, combining this for all counties in the US is a massive project.
Whenever I buy a new home I always look up all my neighbors, figure out when they bought the house, how much they paid etc. Some people get freaked out by this, but this information is public in most counties.
By joining this data with another public data set, you can actually figure out which lender your neighbors used and what their reported income at time of sale, their age and ethnic background.
Of course there are plenty of other ways data brokers come across data, but even cleaning up and joining public data can require a fair bit of time and expertise.
what are some good cheap sources to get this? i have an art project idea that i've wanted make that would require invasive data profiles, but it's very big project and i have no idea where to start
jeffbee|7 months ago
supriyo-biswas|7 months ago
taeric|7 months ago
everdrive|7 months ago
sofixa|7 months ago
To add to this, any mention of "telemetry" is taken to mean your PII being taken by bad actors to abuse, instead of what it is in 99% of cases, which is usage statistics. (X% of our users use feature A, it merits investment). It can be both, but there's usually no place for differentiation, just pitchforks.
ck_one|7 months ago
Let's say we want this dataset: Credit card line items for 35-year-old dentists living on the 400 block of Elm street in local town
How much do I have to pay you to get it?
worik|7 months ago
I do not believe that. I would like evidence before I am convinced
If my bank is releasing that data I am horrified. I live in anew Zealand and our privacy laws are clear: it would be illegal
flossposse|7 months ago
criddell|7 months ago
I think most people here understand that Google sells ads against that data, but they aren't selling the data.
southernplaces7|7 months ago
Melatonic|7 months ago
trollied|7 months ago
The general public have no idea how much ad providers and data brokers know about them.
rvnx|7 months ago
lyton|7 months ago
Relevant article: http://archive.today/fzUL4
rustcleaner|7 months ago
JohnMakin|7 months ago
southernplaces7|7 months ago
OsrsNeedsf2P|7 months ago
Melatonic|7 months ago
blindriver|7 months ago
I’m guessing with the help of Palantir, the government has even more data and can probably link Reddit posts etc based on styleometry and can even perform psychological analysis on your personality and tendencies, etc.
worik|7 months ago
After being burnt by things taken from my social media out of context, used to publicly shame me, I locked down my social media
Am I "sweetly naive" to think that had an effect? I do think it did
Before I stopped using Facebook I noticed, over the last decade, that almost every account I encountered was locked down similarly
My point is I suspect it is getting harder, not easier, for data thieves. The golden age of data theft has passed. Maybe.
rustcleaner|7 months ago
I really need to start using PocketPal (local LLM on Android) to restate my messages.
---
Oh, the places I'd like to send my texts so fine, With PocketPal, a tool that's truly divine, Local LLM on Android, a wondrous device to see, To help me restate my messages with glee! Wheee!
kevin_thibedeau|7 months ago
southernplaces7|7 months ago
roadside_picnic|7 months ago
In many cases joining datasets is both labor intensive and creates a surprising amount of new information, and there is also plenty of "free" data that is incredibly tedious to work with.
I used to work with real estate data for the government and if you search for any common things you might want to know you often land on a data brokers page even though property assessor data is freely available in most counties. The problem is each county has their own system of storing data and their own process for searching it. It's a lot of work to learn how just this one dataset works, combining this for all counties in the US is a massive project.
Whenever I buy a new home I always look up all my neighbors, figure out when they bought the house, how much they paid etc. Some people get freaked out by this, but this information is public in most counties.
By joining this data with another public data set, you can actually figure out which lender your neighbors used and what their reported income at time of sale, their age and ethnic background.
Of course there are plenty of other ways data brokers come across data, but even cleaning up and joining public data can require a fair bit of time and expertise.
unknown|7 months ago
[deleted]
victorbjorklund|7 months ago
greenie_beans|7 months ago
onlyrealcuzzo|7 months ago
It should not be surprising that they are selling your data for a profit...