I feel like we need to start differentiating between "public" personal information and more sensitive personal information (like social security numbers or other government ID numbers). The breach lists this info:
Name
Email
Location
Gender
Phone number
User ID
So basically, everything I used to be able to get in a phone book. Honestly, at this point all of that information should just be considered public, because it obviously is.
If anything I think people are grappling with the fact that the Internet just makes data scraping and processing possible on a scale previously unimaginable, and that's really what people have an issue with, but I don't think there's a great answer to that. I mean, it's one thing to say the front of my house is public info because anyone can come by and take a picture, but it sure feels different when a high resolution photo (or heck, video feed) can be posted online that is instantly available to billions of people.
> everything I used to be able to get in a phone book
Fair, although you could opt out of the phone book. (And I don't think they had location/address, though it's been so long now that I can't remember for sure.)
> I think people are grappling with the fact that the Internet just makes data scraping and processing possible on a scale previously unimaginable
This is it right here. The scale and ease of access are terrifying. It's true that in the olden days, someone could follow me around and write down everywhere I went, everyone I talked to, what stores I went to, my hobbies, and so on. But someone would actually have to do that, and they would have to single me out, and even then the information they collected would be in a notebook, not distributed to virtually every human on the planet.
Now we are all being followed, all the time, and all of that information is available to anyone with almost no cost or effort. This is a sea change, and personally I find it horrifying. There are very, very few people I would trust with that much information. I definitely don't trust the whole world with it.
In the past "public" did not mean a single, all-encompassing global village of information that anyone on earth with a computer could get access to. People then operated within local shells of information, extending outward from the neighborhood block to the city to the country and maybe finally to the world.
What time you walk your dog each day would be neighborhood level public info. Phone numbers would be city level. For greater reach than that you usually had to put the info out there yourself or be someone of media prominence.
Nowadays the time you walk your dog is out on the internet because it was leaked from some Amazon S3 bucket collecting pings from your dog's smart collar. And what more it's been joined with your name, phone number, and other personal info to create an automated profile of you by interested groups.
That's a whole different ball game, and not one that many people expect despite living their lives (in their minds) the same way as before.
This isn't true. The phonebook we had had only Name, Phone number and very broad location. Also those days, phone number was just that - a phone number. Nowadays it's a unique identifier for lots and lots of things including government stuff. Some government stuff even uses it for authentication/authorization.
Also I guess user ID means it gives access to their fb profile page I guess? From there one can scrape pics etc (public ones).
> So basically, everything I used to be able to get in a phone book. Honestly, at this point all of that information should just be considered public, because it obviously is.
With a phonebook (at least back in the day), you didn't risk having your account exposed and sold for a few dollars. Nor did that risk someone getting access to thousands, if not more, bank accounts or whatever through automation. In addition, a phonebook is easy to opt out of making things public. Facebook gives you the illusion that you can opt out of this data being public.
"So basically everything I used to be able to get in a phone book."
Assuming you had phone books from every city/region in every country. Thats a lot of phone books so you must have had a large warehouse to store them all. Then there is the fact that phone books did not list number for every individual. Multiple persons routinely shared the same number.
The comparison sounds apt in theory but in practice it isn't. Try looking these Facebook users up in the phone books of their respective locales, via the telcos' online phone books or directory assistance. Then, using what you find, tell me their email address, gender and Facebook user ID.
Good luck.
The problem with this argument, "all email addresses are public", which I see regularly on HN, is that information does not become "public" and lose its "private" designation if it is published without consent or lawful purpose. If someone steals secrets and publishes them, they are still secrets.
Whether this information from Facebook is truly "private" I cannot say but I do think it is possible to have email addresses that are not made public.
The recent NSO iMessage story was interesting because the exploit seemed to rely on NSO getting lists of mobile phone numbers for the targets. Not email addresses. Yet iMessage will work without a phone number, with no SIM inserted. Perhaps the targets chose to use phone numbers for iMessage, not email addresses.
Consider what happens if someone creates a Gmail address but never uses it to send mail, and never shares the address with anyone, except Facebook. If this person does not make their Facebook profile public, how is this address public information. Google does not publish a list of every Gmail address. According to the logic of the parent comment, they might just as well. Email addresses are "public", right. Because some HN commenters think they are.
What happened when someone scraped Apple's servers to obtain the email addresses of Apple iPad users. Did federal prosecutors think the information was "public" or "private". The media called the incident "theft of e-mail addresses".^1
> If anything I think people are grappling with the fact that the Internet just makes data scraping and processing possible on a scale previously unimaginable, and that's really what people have an issue with, but I don't think there's a great answer to that. I mean, it's one thing to say the front of my house is public info because anyone can come by and take a picture, but it sure feels different when a high resolution photo (or heck, video feed) can be posted online that is instantly available to billions of people.
From your example, it's another thing to have a high resolution photo or video feed of everyone's houses and to, say, send them ads for painting services if the trim looks out of shape.
I think the important thing to get in the public consciousness is that scale alone is sufficient to make information processing fundamentally different than a human interacting with a single data point. Looking up one person in the phone book and calling them or sending them a letter is different than scanning the entire book, robocalling everyone in it, and sending junk mail to all of them. The fact that the former is accepted and that the later is merely the former repeated a million times does not make the latter permissible. The former was accepted because the way the world worked meant that it was simply intractable - an economic nonstarter, a physical and logical impossibility, humanly infeasible - to abuse it into spamming a million people.
For another example, license plates are public, required to be visible on your vehicle on public roads. Prior to license plate scanning technology, a cop could have tailed a suspect and radioed their vehicle description and license plate to have other detectives and officers disperse to intersections and track a vehicle through a city, and depending on the nature of the problem, they could spend a few hundred dollars to dispatch a helicopter to chase it across the freeway. They could conceivably tail a non-suspect, but that wouldn't make any sense, they were constrained by limited resources to only use this ability for a select few vehicles. That was how the world worked. Later, automated license plate readers were developed. With cameras deployed across every intersection in a city, it would be feasible to track all motions of every vehicle at all times; it would likely be cheaper and easier to do so than one year's expenses of deploying personnel to do so manually.
That information should be considered public, because it obviously is, but what a person is allowed to do with public information should not be limited only by what they're able to do with it.
> more sensitive personal information (like social security numbers or other government ID numbers)
I tend to think that there should be a publicly accessible, unique, and more or less immutable ID number for every citizen or resident. This ID would have pointers to our name, birth date and a few other identifiers that shouldn't really be considered secret.
My concern is that the absence of such a unique ID leads to a mess of overlapping systems in which only large organizations with the resources to track everyone will be able to uniquely identify people. So we'll have a degree of anonymity from random other individuals, not not from banks, tech corporations or the government. Computing power is becoming too cheap and ubiquitous to effectively hide information that isn't explicitly confidential. That is, as a society we need to adjust to a paradigm in which it is more expensive to keep information confidential than to allow it to be public. Especially keeping information private from those with deep pockets.
> If anything I think people are grappling with the fact that the Internet just makes data scraping and processing possible on a scale previously unimaginable,
Agreed.
I remember way back when FB first launched the "feed". Folks on Slashdot (yes, that long ago) had great outcry about how much of a violation of privacy it was. I countered that all it was doing was collating all the posts that people were making on their "walls". Nothing new was necessarily being exposed.
People still didn't like it. Someone argued that the extra steps necessary to visit each "friends" wall was a valuable impediment. Obviously, that's a weak position to take, but it reinforces your point: data scraping is easier than anyone seems to be willing to acknowledge. Anything you write in any "semi-public" space should simply be considered entirely public.
My neighbour who is 87 years old has all the phone books from tbe Lage 50s-mid 60s.
I checked my grandfather and it listed bis adress, phone and occupation.
It's never about one item of information released: it's about the aggregation and linking potential. Name/location/phone together form a pretty decent unique identifier. FB obviously gives you friends, interests, hangouts, and most importantly, photographs; none of which you had before.
Ater aggregating with other databases is when the harm comes.
I think that information increased in sensitivity because technology that gives us instant access to it also allows it to be exploited in different new ways. Like there's no machine that can take a paper phonebook and call everyone in it with customized spam messages, but you can trivially do that with a CSV file and 20 lines of python.
These large scale breaches harm everyone's privacy (anonymity). Even people who are not included. Because with enough data you can deanon people, eg thru process of elimination.
Absolutely. Most engineers who work with sensitive data already know that there are tiers of data sensitivity (Public, Personal, Private), and that info like SSN and CCN are more private than, say, gender or marital status.
>I feel like we need to start differentiating between "public" personal information and more sensitive personal information (like social security numbers or other government ID numbers).
The flipside of this is that we need to make it such that simply knowing someone's Name, Address, DOB, and SSN is not adequate to fraudulently assume their financial identity and incur debts in their name.
> So basically, everything I used to be able to get in a phone book.
Even if people didn't have unlisted numbers, phone books would allow listing only last name and first initial of one person in the household, without any location data beyond the phone book service area (you could provide more if you wanted to be found), and didn’t include gender.
> and more sensitive personal information (like social security numbers)
Well I consider SSNs public knowledge at this stage. You can reliably dox anyone in the US now and find out their SSNs. Also: I used to have a sticker on my laptop that had my SSN on it, and brought it to conferences, as a PR stunt for my consultancy.
With a land line phone number from a phonebook, criminals can’t do much. With a smartphone number they can hack phones, potentially steal bank accounts, track their location and on and on.
Wait until HN readers find out what these faceless companies that appear around election time and mail me junk are able to gleam from public voting registration data.
> So basically, everything I used to be able to get in a phone book. Honestly, at this point all of that information should just be considered public, because it obviously is.
I am honestly shocked at your proposal.
In a real paper book you had a choice not to get your number published.
Have you put any thought about people who are maybe running from abusive spouse or any other people who have reason not to have their location data to be broadcasted to entire world?
I know everyone on HN loves to hate on Facebook, but the fact that HN's servers are getting crushed when FB is down perhaps shows a revealed preference.
Facebook is having a very bad day today, You have this hack announced today, all their sites are apparently down due to a bgp issue they are dealing with, and then the bombshell allegations of them intentionally creating toxicity on their platform to enrich themselves at the expense of society. Zuck's world is slowly collapsing in on him, expect heavy regulation and the beginning of the end of facebook as we know it.
Some might laugh this off as 'oh it's just scraping'. But I remember reading some comments in HN that there are apps that can scan faces and pull personal info including where they live, work etc. So each leak uncovers a person little by little.
This vindicates the stance taken by Signal to not even collect metadata.
Edit: I mean surreptitiously scan the face of a stranger you see in public and the app will tell you about them. Don't know names of the apps.
And some wonder why I'm not letting myself forced into dual authentication providing them with real phone number. Actually I am very reluctant to log into Facebook at all, twice a year perhaps seeing old friends making an attempt to communicate with me, then only from private browsing, perhaps VPN too. I do not trust them with any shred of additional info on me to that they do not have already from earlier.
I miss a lot of links sent to me pointing to facebook post or something, no, actually I do not miss a little thing, I rather do not care about cute animals or strange people or thoughts, it is invaluable in 99,999% of the cases, for the rest I can take the loss.
I'm glad I never gave FB my real phone number or birthday. Nobody should. I predicted this would happen some day. It's always just a matter of probability and time.
What a broken product. There isn’t 1.5B users with public profiles in FB, so whatever methods these guys used clearly went beyond regular data scrapping.
SearchPeopleFree<dot>com pretty much has a ton on a good majority of people. Get their phone number and if you want learn a lot about them.
So it just compiles public information. No opinion whether it's a good or bad thing here from me just pointing it out.
Expected in all honesty, data these days is a currency to be bartered. I despise facebooks and everything it stands for, I wish people would take themselves more seriously. This need to fill a void with nonsense, is just simply unbecoming. So, suffer the consequences.
[+] [-] mzs|4 years ago|reply
https://twitter.com/AricToler/status/1445100884740935686
[+] [-] hn_throwaway_99|4 years ago|reply
Name Email Location Gender Phone number User ID
So basically, everything I used to be able to get in a phone book. Honestly, at this point all of that information should just be considered public, because it obviously is.
If anything I think people are grappling with the fact that the Internet just makes data scraping and processing possible on a scale previously unimaginable, and that's really what people have an issue with, but I don't think there's a great answer to that. I mean, it's one thing to say the front of my house is public info because anyone can come by and take a picture, but it sure feels different when a high resolution photo (or heck, video feed) can be posted online that is instantly available to billions of people.
[+] [-] _moof|4 years ago|reply
Fair, although you could opt out of the phone book. (And I don't think they had location/address, though it's been so long now that I can't remember for sure.)
> I think people are grappling with the fact that the Internet just makes data scraping and processing possible on a scale previously unimaginable
This is it right here. The scale and ease of access are terrifying. It's true that in the olden days, someone could follow me around and write down everywhere I went, everyone I talked to, what stores I went to, my hobbies, and so on. But someone would actually have to do that, and they would have to single me out, and even then the information they collected would be in a notebook, not distributed to virtually every human on the planet.
Now we are all being followed, all the time, and all of that information is available to anyone with almost no cost or effort. This is a sea change, and personally I find it horrifying. There are very, very few people I would trust with that much information. I definitely don't trust the whole world with it.
[+] [-] ramblenode|4 years ago|reply
What time you walk your dog each day would be neighborhood level public info. Phone numbers would be city level. For greater reach than that you usually had to put the info out there yourself or be someone of media prominence.
Nowadays the time you walk your dog is out on the internet because it was leaked from some Amazon S3 bucket collecting pings from your dog's smart collar. And what more it's been joined with your name, phone number, and other personal info to create an automated profile of you by interested groups.
That's a whole different ball game, and not one that many people expect despite living their lives (in their minds) the same way as before.
[+] [-] throwaway78981|4 years ago|reply
Also I guess user ID means it gives access to their fb profile page I guess? From there one can scrape pics etc (public ones).
[+] [-] gizdan|4 years ago|reply
With a phonebook (at least back in the day), you didn't risk having your account exposed and sold for a few dollars. Nor did that risk someone getting access to thousands, if not more, bank accounts or whatever through automation. In addition, a phonebook is easy to opt out of making things public. Facebook gives you the illusion that you can opt out of this data being public.
Edit: opt of -> opt out of
[+] [-] 1vuio0pswjnm7|4 years ago|reply
Assuming you had phone books from every city/region in every country. Thats a lot of phone books so you must have had a large warehouse to store them all. Then there is the fact that phone books did not list number for every individual. Multiple persons routinely shared the same number.
The comparison sounds apt in theory but in practice it isn't. Try looking these Facebook users up in the phone books of their respective locales, via the telcos' online phone books or directory assistance. Then, using what you find, tell me their email address, gender and Facebook user ID.
Good luck.
The problem with this argument, "all email addresses are public", which I see regularly on HN, is that information does not become "public" and lose its "private" designation if it is published without consent or lawful purpose. If someone steals secrets and publishes them, they are still secrets.
Whether this information from Facebook is truly "private" I cannot say but I do think it is possible to have email addresses that are not made public.
The recent NSO iMessage story was interesting because the exploit seemed to rely on NSO getting lists of mobile phone numbers for the targets. Not email addresses. Yet iMessage will work without a phone number, with no SIM inserted. Perhaps the targets chose to use phone numbers for iMessage, not email addresses.
Consider what happens if someone creates a Gmail address but never uses it to send mail, and never shares the address with anyone, except Facebook. If this person does not make their Facebook profile public, how is this address public information. Google does not publish a list of every Gmail address. According to the logic of the parent comment, they might just as well. Email addresses are "public", right. Because some HN commenters think they are.
What happened when someone scraped Apple's servers to obtain the email addresses of Apple iPad users. Did federal prosecutors think the information was "public" or "private". The media called the incident "theft of e-mail addresses".^1
1. http://www.nbcnews.com/id/41196595
[+] [-] LeifCarrotson|4 years ago|reply
From your example, it's another thing to have a high resolution photo or video feed of everyone's houses and to, say, send them ads for painting services if the trim looks out of shape.
I think the important thing to get in the public consciousness is that scale alone is sufficient to make information processing fundamentally different than a human interacting with a single data point. Looking up one person in the phone book and calling them or sending them a letter is different than scanning the entire book, robocalling everyone in it, and sending junk mail to all of them. The fact that the former is accepted and that the later is merely the former repeated a million times does not make the latter permissible. The former was accepted because the way the world worked meant that it was simply intractable - an economic nonstarter, a physical and logical impossibility, humanly infeasible - to abuse it into spamming a million people.
For another example, license plates are public, required to be visible on your vehicle on public roads. Prior to license plate scanning technology, a cop could have tailed a suspect and radioed their vehicle description and license plate to have other detectives and officers disperse to intersections and track a vehicle through a city, and depending on the nature of the problem, they could spend a few hundred dollars to dispatch a helicopter to chase it across the freeway. They could conceivably tail a non-suspect, but that wouldn't make any sense, they were constrained by limited resources to only use this ability for a select few vehicles. That was how the world worked. Later, automated license plate readers were developed. With cameras deployed across every intersection in a city, it would be feasible to track all motions of every vehicle at all times; it would likely be cheaper and easier to do so than one year's expenses of deploying personnel to do so manually.
That information should be considered public, because it obviously is, but what a person is allowed to do with public information should not be limited only by what they're able to do with it.
[+] [-] loudmax|4 years ago|reply
I tend to think that there should be a publicly accessible, unique, and more or less immutable ID number for every citizen or resident. This ID would have pointers to our name, birth date and a few other identifiers that shouldn't really be considered secret.
My concern is that the absence of such a unique ID leads to a mess of overlapping systems in which only large organizations with the resources to track everyone will be able to uniquely identify people. So we'll have a degree of anonymity from random other individuals, not not from banks, tech corporations or the government. Computing power is becoming too cheap and ubiquitous to effectively hide information that isn't explicitly confidential. That is, as a society we need to adjust to a paradigm in which it is more expensive to keep information confidential than to allow it to be public. Especially keeping information private from those with deep pockets.
[+] [-] mankyd|4 years ago|reply
Agreed.
I remember way back when FB first launched the "feed". Folks on Slashdot (yes, that long ago) had great outcry about how much of a violation of privacy it was. I countered that all it was doing was collating all the posts that people were making on their "walls". Nothing new was necessarily being exposed.
People still didn't like it. Someone argued that the extra steps necessary to visit each "friends" wall was a valuable impediment. Obviously, that's a weak position to take, but it reinforces your point: data scraping is easier than anyone seems to be willing to acknowledge. Anything you write in any "semi-public" space should simply be considered entirely public.
[+] [-] azta6521|4 years ago|reply
[+] [-] normaler|4 years ago|reply
[+] [-] imglorp|4 years ago|reply
> Name Email Location Gender Phone number User ID
It's never about one item of information released: it's about the aggregation and linking potential. Name/location/phone together form a pretty decent unique identifier. FB obviously gives you friends, interests, hangouts, and most importantly, photographs; none of which you had before.
Ater aggregating with other databases is when the harm comes.
[+] [-] moolcool|4 years ago|reply
[+] [-] unknown|4 years ago|reply
[deleted]
[+] [-] specialist|4 years ago|reply
No, we don't. PII is PII.
These large scale breaches harm everyone's privacy (anonymity). Even people who are not included. Because with enough data you can deanon people, eg thru process of elimination.
[+] [-] robbyking|4 years ago|reply
[+] [-] sabellito|4 years ago|reply
[+] [-] unknown|4 years ago|reply
[deleted]
[+] [-] DebtDeflation|4 years ago|reply
The flipside of this is that we need to make it such that simply knowing someone's Name, Address, DOB, and SSN is not adequate to fraudulently assume their financial identity and incur debts in their name.
[+] [-] dragonwriter|4 years ago|reply
Even if people didn't have unlisted numbers, phone books would allow listing only last name and first initial of one person in the household, without any location data beyond the phone book service area (you could provide more if you wanted to be found), and didn’t include gender.
[+] [-] barbazoo|4 years ago|reply
You were able to opt out of phone books and they also didn't contain email and gender.
[+] [-] unknown|4 years ago|reply
[deleted]
[+] [-] spansoa|4 years ago|reply
Well I consider SSNs public knowledge at this stage. You can reliably dox anyone in the US now and find out their SSNs. Also: I used to have a sticker on my laptop that had my SSN on it, and brought it to conferences, as a PR stunt for my consultancy.
[+] [-] twobitshifter|4 years ago|reply
[+] [-] asdff|4 years ago|reply
[+] [-] intricatedetail|4 years ago|reply
[+] [-] themdonuts|4 years ago|reply
[+] [-] lmilcin|4 years ago|reply
I am honestly shocked at your proposal.
In a real paper book you had a choice not to get your number published.
Have you put any thought about people who are maybe running from abusive spouse or any other people who have reason not to have their location data to be broadcasted to entire world?
[+] [-] dataflow|4 years ago|reply
Your phone books had your login usernames and emails?
[+] [-] paulpauper|4 years ago|reply
[+] [-] slyrus|4 years ago|reply
[+] [-] Alex3917|4 years ago|reply
[+] [-] throwaway3975|4 years ago|reply
[+] [-] subsubzero|4 years ago|reply
[+] [-] throwaway78981|4 years ago|reply
This vindicates the stance taken by Signal to not even collect metadata.
Edit: I mean surreptitiously scan the face of a stranger you see in public and the app will tell you about them. Don't know names of the apps.
[+] [-] drclau|4 years ago|reply
Is the downtime (at the time of writing) their way of blocking a known ongoing attack that can't be stopped fast and safely enough by other means?
Something like: 1) take everything down, 2) fix the bug, 3) deploy everywhere, 4) start everything up.
And, to stop clients from connecting, take down the DNS too. DNS is also a great scapegoat.
[+] [-] mihaaly|4 years ago|reply
[+] [-] smsm42|4 years ago|reply
2. Somebody sees this information and records it.
3. They publish this information on another site.
4. "Hackers stole my private data!!!!"
Really?!
[+] [-] zohvek|4 years ago|reply
[+] [-] dheera|4 years ago|reply
[+] [-] DevKoala|4 years ago|reply
[+] [-] fairity|4 years ago|reply
Source: https://twitter.com/WAK4S/status/1444276266362982400/photo/1
[+] [-] afrcnc|4 years ago|reply
[+] [-] unknown|4 years ago|reply
[deleted]
[+] [-] dirigent|4 years ago|reply
[+] [-] jrs235|4 years ago|reply
[+] [-] paul7986|4 years ago|reply
[+] [-] Program_Install|4 years ago|reply
[+] [-] blitzar|4 years ago|reply