top | item 28748906

Web Scrapers Claim to Sell Personal Data on Facebook Users on a Hacker Forum

527 points| comprev | 4 years ago |privacyaffairs.com

144 comments

[+] mzs|4 years ago|reply

doubts about veracity:

https://twitter.com/AricToler/status/1445100884740935686

[+] hn_throwaway_99|4 years ago|reply

I feel like we need to start differentiating between "public" personal information and more sensitive personal information (like social security numbers or other government ID numbers). The breach lists this info:

Name Email Location Gender Phone number User ID

So basically, everything I used to be able to get in a phone book. Honestly, at this point all of that information should just be considered public, because it obviously is.

If anything I think people are grappling with the fact that the Internet just makes data scraping and processing possible on a scale previously unimaginable, and that's really what people have an issue with, but I don't think there's a great answer to that. I mean, it's one thing to say the front of my house is public info because anyone can come by and take a picture, but it sure feels different when a high resolution photo (or heck, video feed) can be posted online that is instantly available to billions of people.

[+] _moof|4 years ago|reply

> everything I used to be able to get in a phone book

Fair, although you could opt out of the phone book. (And I don't think they had location/address, though it's been so long now that I can't remember for sure.)

> I think people are grappling with the fact that the Internet just makes data scraping and processing possible on a scale previously unimaginable

This is it right here. The scale and ease of access are terrifying. It's true that in the olden days, someone could follow me around and write down everywhere I went, everyone I talked to, what stores I went to, my hobbies, and so on. But someone would actually have to do that, and they would have to single me out, and even then the information they collected would be in a notebook, not distributed to virtually every human on the planet.

Now we are all being followed, all the time, and all of that information is available to anyone with almost no cost or effort. This is a sea change, and personally I find it horrifying. There are very, very few people I would trust with that much information. I definitely don't trust the whole world with it.

[+] ramblenode|4 years ago|reply

In the past "public" did not mean a single, all-encompassing global village of information that anyone on earth with a computer could get access to. People then operated within local shells of information, extending outward from the neighborhood block to the city to the country and maybe finally to the world.

What time you walk your dog each day would be neighborhood level public info. Phone numbers would be city level. For greater reach than that you usually had to put the info out there yourself or be someone of media prominence.

Nowadays the time you walk your dog is out on the internet because it was leaked from some Amazon S3 bucket collecting pings from your dog's smart collar. And what more it's been joined with your name, phone number, and other personal info to create an automated profile of you by interested groups.

That's a whole different ball game, and not one that many people expect despite living their lives (in their minds) the same way as before.

[+] throwaway78981|4 years ago|reply

This isn't true. The phonebook we had had only Name, Phone number and very broad location. Also those days, phone number was just that - a phone number. Nowadays it's a unique identifier for lots and lots of things including government stuff. Some government stuff even uses it for authentication/authorization.

Also I guess user ID means it gives access to their fb profile page I guess? From there one can scrape pics etc (public ones).

[+] gizdan|4 years ago|reply

> So basically, everything I used to be able to get in a phone book. Honestly, at this point all of that information should just be considered public, because it obviously is.

With a phonebook (at least back in the day), you didn't risk having your account exposed and sold for a few dollars. Nor did that risk someone getting access to thousands, if not more, bank accounts or whatever through automation. In addition, a phonebook is easy to opt out of making things public. Facebook gives you the illusion that you can opt out of this data being public.

Edit: opt of -> opt out of

[+] 1vuio0pswjnm7|4 years ago|reply

"So basically everything I used to be able to get in a phone book."

Assuming you had phone books from every city/region in every country. Thats a lot of phone books so you must have had a large warehouse to store them all. Then there is the fact that phone books did not list number for every individual. Multiple persons routinely shared the same number.

The comparison sounds apt in theory but in practice it isn't. Try looking these Facebook users up in the phone books of their respective locales, via the telcos' online phone books or directory assistance. Then, using what you find, tell me their email address, gender and Facebook user ID.

Good luck.

The problem with this argument, "all email addresses are public", which I see regularly on HN, is that information does not become "public" and lose its "private" designation if it is published without consent or lawful purpose. If someone steals secrets and publishes them, they are still secrets.

Whether this information from Facebook is truly "private" I cannot say but I do think it is possible to have email addresses that are not made public.

The recent NSO iMessage story was interesting because the exploit seemed to rely on NSO getting lists of mobile phone numbers for the targets. Not email addresses. Yet iMessage will work without a phone number, with no SIM inserted. Perhaps the targets chose to use phone numbers for iMessage, not email addresses.

Consider what happens if someone creates a Gmail address but never uses it to send mail, and never shares the address with anyone, except Facebook. If this person does not make their Facebook profile public, how is this address public information. Google does not publish a list of every Gmail address. According to the logic of the parent comment, they might just as well. Email addresses are "public", right. Because some HN commenters think they are.

What happened when someone scraped Apple's servers to obtain the email addresses of Apple iPad users. Did federal prosecutors think the information was "public" or "private". The media called the incident "theft of e-mail addresses".^1

1. http://www.nbcnews.com/id/41196595

[+] LeifCarrotson|4 years ago|reply

> If anything I think people are grappling with the fact that the Internet just makes data scraping and processing possible on a scale previously unimaginable, and that's really what people have an issue with, but I don't think there's a great answer to that. I mean, it's one thing to say the front of my house is public info because anyone can come by and take a picture, but it sure feels different when a high resolution photo (or heck, video feed) can be posted online that is instantly available to billions of people.

From your example, it's another thing to have a high resolution photo or video feed of everyone's houses and to, say, send them ads for painting services if the trim looks out of shape.

I think the important thing to get in the public consciousness is that scale alone is sufficient to make information processing fundamentally different than a human interacting with a single data point. Looking up one person in the phone book and calling them or sending them a letter is different than scanning the entire book, robocalling everyone in it, and sending junk mail to all of them. The fact that the former is accepted and that the later is merely the former repeated a million times does not make the latter permissible. The former was accepted because the way the world worked meant that it was simply intractable - an economic nonstarter, a physical and logical impossibility, humanly infeasible - to abuse it into spamming a million people.

For another example, license plates are public, required to be visible on your vehicle on public roads. Prior to license plate scanning technology, a cop could have tailed a suspect and radioed their vehicle description and license plate to have other detectives and officers disperse to intersections and track a vehicle through a city, and depending on the nature of the problem, they could spend a few hundred dollars to dispatch a helicopter to chase it across the freeway. They could conceivably tail a non-suspect, but that wouldn't make any sense, they were constrained by limited resources to only use this ability for a select few vehicles. That was how the world worked. Later, automated license plate readers were developed. With cameras deployed across every intersection in a city, it would be feasible to track all motions of every vehicle at all times; it would likely be cheaper and easier to do so than one year's expenses of deploying personnel to do so manually.

That information should be considered public, because it obviously is, but what a person is allowed to do with public information should not be limited only by what they're able to do with it.

[+] loudmax|4 years ago|reply

> more sensitive personal information (like social security numbers or other government ID numbers)

I tend to think that there should be a publicly accessible, unique, and more or less immutable ID number for every citizen or resident. This ID would have pointers to our name, birth date and a few other identifiers that shouldn't really be considered secret.

My concern is that the absence of such a unique ID leads to a mess of overlapping systems in which only large organizations with the resources to track everyone will be able to uniquely identify people. So we'll have a degree of anonymity from random other individuals, not not from banks, tech corporations or the government. Computing power is becoming too cheap and ubiquitous to effectively hide information that isn't explicitly confidential. That is, as a society we need to adjust to a paradigm in which it is more expensive to keep information confidential than to allow it to be public. Especially keeping information private from those with deep pockets.

[+] mankyd|4 years ago|reply

> If anything I think people are grappling with the fact that the Internet just makes data scraping and processing possible on a scale previously unimaginable,

Agreed.

I remember way back when FB first launched the "feed". Folks on Slashdot (yes, that long ago) had great outcry about how much of a violation of privacy it was. I countered that all it was doing was collating all the posts that people were making on their "walls". Nothing new was necessarily being exposed.

People still didn't like it. Someone argued that the extra steps necessary to visit each "friends" wall was a valuable impediment. Obviously, that's a weak position to take, but it reinforces your point: data scraping is easier than anyone seems to be willing to acknowledge. Anything you write in any "semi-public" space should simply be considered entirely public.

[+] azta6521|4 years ago|reply

True, but my last paper phone book did not have 1,500,000,000 entries.

[+] normaler|4 years ago|reply

My neighbour who is 87 years old has all the phone books from tbe Lage 50s-mid 60s. I checked my grandfather and it listed bis adress, phone and occupation.

[+] imglorp|4 years ago|reply

Strong disagree.

> Name Email Location Gender Phone number User ID

It's never about one item of information released: it's about the aggregation and linking potential. Name/location/phone together form a pretty decent unique identifier. FB obviously gives you friends, interests, hangouts, and most importantly, photographs; none of which you had before.

Ater aggregating with other databases is when the harm comes.

[+] moolcool|4 years ago|reply

I think that information increased in sensitivity because technology that gives us instant access to it also allows it to be exploited in different new ways. Like there's no machine that can take a paper phonebook and call everyone in it with customized spam messages, but you can trivially do that with a CSV file and 20 lines of python.

[+] unknown|4 years ago|reply

[deleted]

[+] specialist|4 years ago|reply

> we need to start differentiating between

No, we don't. PII is PII.

These large scale breaches harm everyone's privacy (anonymity). Even people who are not included. Because with enough data you can deanon people, eg thru process of elimination.

[+] robbyking|4 years ago|reply

Absolutely. Most engineers who work with sensitive data already know that there are tiers of data sensitivity (Public, Personal, Private), and that info like SSN and CCN are more private than, say, gender or marital status.

[+] sabellito|4 years ago|reply

Perhaps this insanity you're describing is true for the US. It doesn't necessarily account for the remaining... 1.2B people who had their info leaked.

[+] unknown|4 years ago|reply

[deleted]

[+] DebtDeflation|4 years ago|reply

>I feel like we need to start differentiating between "public" personal information and more sensitive personal information (like social security numbers or other government ID numbers).

The flipside of this is that we need to make it such that simply knowing someone's Name, Address, DOB, and SSN is not adequate to fraudulently assume their financial identity and incur debts in their name.

[+] dragonwriter|4 years ago|reply

> So basically, everything I used to be able to get in a phone book.

Even if people didn't have unlisted numbers, phone books would allow listing only last name and first initial of one person in the household, without any location data beyond the phone book service area (you could provide more if you wanted to be found), and didn’t include gender.

[+] barbazoo|4 years ago|reply

Sounds a bit like moving the goal posts to me.

You were able to opt out of phone books and they also didn't contain email and gender.

[+] unknown|4 years ago|reply

[deleted]

[+] spansoa|4 years ago|reply

> and more sensitive personal information (like social security numbers)

Well I consider SSNs public knowledge at this stage. You can reliably dox anyone in the US now and find out their SSNs. Also: I used to have a sticker on my laptop that had my SSN on it, and brought it to conferences, as a PR stunt for my consultancy.

[+] twobitshifter|4 years ago|reply

With a land line phone number from a phonebook, criminals can’t do much. With a smartphone number they can hack phones, potentially steal bank accounts, track their location and on and on.

[+] asdff|4 years ago|reply

Wait until HN readers find out what these faceless companies that appear around election time and mail me junk are able to gleam from public voting registration data.

[+] intricatedetail|4 years ago|reply

Problem is user id link. From there you can get much more info. Facebook should be legally forced to reindex users and void all current user ids.

[+] themdonuts|4 years ago|reply

The comment is good, but your username is excellent.

[+] lmilcin|4 years ago|reply

> So basically, everything I used to be able to get in a phone book. Honestly, at this point all of that information should just be considered public, because it obviously is.

I am honestly shocked at your proposal.

In a real paper book you had a choice not to get your number published.

Have you put any thought about people who are maybe running from abusive spouse or any other people who have reason not to have their location data to be broadcasted to entire world?

[+] dataflow|4 years ago|reply

> So basically, everything I used to be able to get in a phone book.

Your phone books had your login usernames and emails?

[+] paulpauper|4 years ago|reply

Couldn't a social security number be easily bruteforced anyway

[+] slyrus|4 years ago|reply

The new phone books are here!!

[+] Alex3917|4 years ago|reply

I know everyone on HN loves to hate on Facebook, but the fact that HN's servers are getting crushed when FB is down perhaps shows a revealed preference.

[+] throwaway3975|4 years ago|reply

“The traders claim to have obtained the data by scraping rather than hacking or compromising individual users’ accounts.”

[+] subsubzero|4 years ago|reply

Facebook is having a very bad day today, You have this hack announced today, all their sites are apparently down due to a bgp issue they are dealing with, and then the bombshell allegations of them intentionally creating toxicity on their platform to enrich themselves at the expense of society. Zuck's world is slowly collapsing in on him, expect heavy regulation and the beginning of the end of facebook as we know it.

[+] throwaway78981|4 years ago|reply

Some might laugh this off as 'oh it's just scraping'. But I remember reading some comments in HN that there are apps that can scan faces and pull personal info including where they live, work etc. So each leak uncovers a person little by little.

This vindicates the stance taken by Signal to not even collect metadata.

Edit: I mean surreptitiously scan the face of a stranger you see in public and the app will tell you about them. Don't know names of the apps.

[+] drclau|4 years ago|reply

(just a wild theory)

Is the downtime (at the time of writing) their way of blocking a known ongoing attack that can't be stopped fast and safely enough by other means?

Something like: 1) take everything down, 2) fix the bug, 3) deploy everywhere, 4) start everything up.

And, to stop clients from connecting, take down the DNS too. DNS is also a great scapegoat.

[+] mihaaly|4 years ago|reply

And some wonder why I'm not letting myself forced into dual authentication providing them with real phone number. Actually I am very reluctant to log into Facebook at all, twice a year perhaps seeing old friends making an attempt to communicate with me, then only from private browsing, perhaps VPN too. I do not trust them with any shred of additional info on me to that they do not have already from earlier. I miss a lot of links sent to me pointing to facebook post or something, no, actually I do not miss a little thing, I rather do not care about cute animals or strange people or thoughts, it is invaluable in 99,999% of the cases, for the rest I can take the loss.

[+] smsm42|4 years ago|reply

1. You publish your information on a public site designed to disseminate information to as many people as possible.

2. Somebody sees this information and records it.

3. They publish this information on another site.

4. "Hackers stole my private data!!!!"

Really?!

[+] zohvek|4 years ago|reply

Facebook is having one hellva last 48 hours I tell you what.

[+] dheera|4 years ago|reply

I'm glad I never gave FB my real phone number or birthday. Nobody should. I predicted this would happen some day. It's always just a matter of probability and time.

[+] DevKoala|4 years ago|reply

What a broken product. There isn’t 1.5B users with public profiles in FB, so whatever methods these guys used clearly went beyond regular data scrapping.

[+] fairity|4 years ago|reply

Interestingly, the forum that hosted this sales thread, raidforums.com, has apparently been taken down by their registrar for the next 30 days.

Source: https://twitter.com/WAK4S/status/1444276266362982400/photo/1

[+] afrcnc|4 years ago|reply

Fake news. Public data scrapped of a Facebook profile is not "personal information" if everyone can already see it

[+] unknown|4 years ago|reply

[deleted]

[+] dirigent|4 years ago|reply

Is there any real reason(s) to use facebook in 2021? Like, why bother? Are there any actual use case of owning facebook account now?

[+] jrs235|4 years ago|reply

So this might explain why I've suddenly gotten a huge increase in SPAM text messages that know my name.

[+] paul7986|4 years ago|reply

SearchPeopleFree<dot>com pretty much has a ton on a good majority of people. Get their phone number and if you want learn a lot about them. So it just compiles public information. No opinion whether it's a good or bad thing here from me just pointing it out.

[+] Program_Install|4 years ago|reply

Expected in all honesty, data these days is a currency to be bartered. I despise facebooks and everything it stands for, I wish people would take themselves more seriously. This need to fill a void with nonsense, is just simply unbecoming. So, suffer the consequences.

[+] blitzar|4 years ago|reply

Its a web scrape ... Public Information of More Than 1.5B Facebook Users Sold on Hacker Forum