The "540 million records" wording seems misleading (probably intentionally by UpGuard and/or TechCrunch). The screenshot on https://www.upguard.com/breaches/facebook-user-data-leak leads me to think that this is 540m object records of various types (posts, comments, etc), not records of 540m distinct users like some readers would think.
It sounds like a lot, but it's not. You could probably scrape that much data from public Facebook pages in a few days without even being logged in, especially a few years ago. Heck, you could say right now that Reddit has billions of user records exposed if you define them that way. The Hacker News first page itself links to thousands of user records :)
Amazing how a third party can harvest that amount of data and Facebook is freely handing it out... they really have no control over the data they process and handle. It's been shown again and again.
It seems Facebook should be forced to disable any kind of data sharing with 3rd parties since they obviously cannot make it work. They have enough issues with the security of internal data handling procedurs already that they have to fix, before giving data to third parties.
Third parties can always just resort to web-scraping if API support is dropped. If you consistently scrape public pages on Facebook, you can amass a trove of data within a year. By supporting an API, Facebook offers a controlled avenue for this to happen, which people pay for because it's easier than scraping. It will still happen if this doesn't exist, though.
These things are very hard to stop. First law of the internet says that if you have a public website, it will be scraped and turned into structured data. Over the years, Facebook has been adding more options to make profiles private, etc. but there are still loopholes around these things with 3rd party "delegated" authentication.
https://www.facebook.com/data-abuse - as mentioned in the article this scenario (non-fb companies mishandling fb user data) is exactly the reason Facebooks data abuse bounty program exists. Hopefully the finders of this submitted to the program.
Alas, the 21st century provides the opportunity to address the growing scourge of using sounds or combinations of letters that communicate meaning without being divisible into smaller units capable of independent use.
[+] [-] dang|7 years ago|reply
[+] [-] testplzignore|7 years ago|reply
It sounds like a lot, but it's not. You could probably scrape that much data from public Facebook pages in a few days without even being logged in, especially a few years ago. Heck, you could say right now that Reddit has billions of user records exposed if you define them that way. The Hacker News first page itself links to thousands of user records :)
[+] [-] kerng|7 years ago|reply
It seems Facebook should be forced to disable any kind of data sharing with 3rd parties since they obviously cannot make it work. They have enough issues with the security of internal data handling procedurs already that they have to fix, before giving data to third parties.
[+] [-] SlowRobotAhead|7 years ago|reply
That is a massive part of their model, so that will never happen. The alternative of course is to stop giving them data.
[+] [-] anonytrary|7 years ago|reply
[+] [-] anonytrary|7 years ago|reply
[+] [-] torqueTorrent|7 years ago|reply
[+] [-] badwolf|7 years ago|reply
[+] [-] nvr219|7 years ago|reply
[+] [-] collingreene|7 years ago|reply
[+] [-] jakequist|7 years ago|reply
[+] [-] mindfulplay|7 years ago|reply
[+] [-] nvr219|7 years ago|reply
[+] [-] torqueTorrent|7 years ago|reply
[+] [-] debaserab2|7 years ago|reply