9th Circuit holds that scraping a public website does not violate the CFAA [pdf]

[+] Animats|6 years ago|reply

This action does more than that. The court left the preliminary injunction against LinkedIn in place: "The district court granted hiQ’s motion. It ordered LinkedIn to withdraw its cease-and-desist letter, to remove any existing technical barriers to hiQ’s access to public profiles, and to refrain from putting in place any legal or technical measures with the effect of blocking hiQ’s access to public profiles."

So LinkedIn is prohibited from blocking hiQ's access by technical means. That's a strong holding. If this case is eventually decided in favor of hiQ, scrapers can no longer be blocked. Throttled a little, maybe, but no more than other users doing a comparable query rate.

[+] tgsovlerkhgsel|6 years ago|reply

Not allowing the CFAA to be (ab)used to attempt to make scraping illegal makes sense.

However, how is it reasonable to force a web site to serve its contents to a third-party company, without being allowed to make a decision whether to serve it or not? Serving the web site costs money, and the scraper surely isn't going to generate ad income...

[+] LeoPanthera|6 years ago|reply

Does this prevent Google from returning captchas if you use a robot to scrape the search result pages, as they currently do?

[+] nevi-me|6 years ago|reply

Can we also be allowed to view people's profiles without being forced to sign in to LinkedIn? They lost my trust many years ago with their shady practises and 'dark patterns', so I don't want to share any of my data with them.

I however sometimes want to look up people. Or would this be a case of wanting to have my cake and eat it?

[+] 3xblah|6 years ago|reply

"If this case is eventually decided in favor of hiQ, scrapers can no longer be blocked."

If the case is decided in favour of hiQ, then, absent an injunction, what would prevent a website from blocking a scraper? Maybe the website could still block unless and until the scraper gets her lawyers to file an injunction.

Another interpretation is that if hiQ wins, then in the 9th Circuit's jurisdiction websites serving public information they neither own nor exclusively license may no longer try to to use the CFAA and/or copyright law to threaten scrapers.

[+] unknown|6 years ago|reply

[deleted]

[+] derefr|6 years ago|reply

Scrapers generally, sure. Not sure about scrapers on other social-media sites like Facebook, though.

The question being: if just having access to the network isn't enough to grant you access to the data of a specific profile, but instead you have to aggregate samples from a bunch of people in the network in order to see "through their eyes" to the data on the profiles of their friends and friends-of-friends, is that allowed?

Because, if even that was allowed, that'd surely open a different kind of floodgate.

[+] emilfihlman|6 years ago|reply

Leaving the injunction in place is insane and a huge oversight. It amounts to making web pages carriers that cannot select who they serve.

It should have said only that there is nothing judicially wrong with scraping but also not limited the rights of a service.

[+] cm2187|6 years ago|reply

And I presume it would also apply to cloudflare’s captchas.

[+] Causality1|6 years ago|reply

Would it hold up if profiles were only visible to logged-in users and part of the sing-up EULA was an agreement not to scrape profiles with automated tools?

[+] buro9|6 years ago|reply

.

[+] meowface|6 years ago|reply

Considering the kind of private scraping and selling tactics LinkedIn has been chronically guilty of (and not just the ordinary "growth hack" stuff: "LinkedIn violated data protection by using 18M email addresses of non-members to buy targeted ads on Facebook" [1]), it's satisfying to see LinkedIn lose this.

[1] https://techcrunch.com/2018/11/24/linkedin-ireland-data-prot...

[+] penagwin|6 years ago|reply

I feel like this is a really common theme I've seen several times. Something like "Music Lyric site X sues Google for embedding their lyrics in the results directly" which is funny because site X got the lyrics by scraping them from other sites.

Plus Google only exists from scraping content, but I believe their TOS includes "don't scrape our content".

I find it really funny that the scrapers are battling scrapers - like guys you only exist because you do THE EXACT SAME THING

[+] mullingitover|6 years ago|reply

> LinkedIn has taken steps to protect the data on its website from what it perceives as misuse or misappropriation. The instructions in LinkedIn’s “robots.txt” file—a text file used by website owners to communicate with search engine crawlers and other web robots—prohibit access to LinkedIn servers via automated bots, except that certain entities, like the Google search engine, have express permission from LinkedIn for bot access.

Not a big fan of weev, but this sure seems like he got screwed if he was just enumerating public web pages and went to jail for it.[1]

[1] https://en.wikipedia.org/wiki/Weev#AT&T_data_breach

[+] lwf|6 years ago|reply

Even if the case was tried today, 9th Cir. isn't binding on other regions of the US, and there's a bit of a split, as detailed in the opinion[1]:

> In recognizing that the CFAA is best understood as an anti-intrusion statute and not as a “misappropriation statute,” we rejected the contract-based interpretation of the CFAA’s “without authorization” provision adopted by some of our sister circuits. Compare Facebook, Inc. v. Power Ventures, Inc., 844 F.3d 1058, 1067 (9th Cir. 2016), cert. denied, 138 S. Ct. 313 (2017) (“[A] violation of the terms of use of a website—without more— cannot establish liability under the CFAA.”); Nosal I, 676 F.3d at 862 (“We remain unpersuaded by the decisions of our sister circuits that interpret the CFAA broadly to cover violations of corporate computer use restrictions or violations of a duty of loyalty.”), with EF Cultural Travel BV v. Explorica, Inc., 274 F.3d 577, 583–84 (1st Cir. 2001) (holding that violations of a confidentiality agreement or other contractual restraints could give rise to a claim for unauthorized access under the CFAA); United States v. Rodriguez, 628 F.3d 1258, 1263 (11th Cir. 2010) (holding that a defendant “exceeds authorized access” when violating policies governing authorized use of databases).

weev was tried in an area under the 3rd Cir. jurisdiction. Somewhat interestingly, his conviction was thrown out in 2014 on venue grounds (e.g. being tried in NJ), without addressing the statutory question.[2]

[1]: pp. 27-28 [2]: https://en.wikipedia.org/wiki/Weev?oldid=912921723#cite_ref-...

[+] tick_tock_tick|6 years ago|reply

Not saying the court made the right call but for that case the big issue for the court was the pages were clearly not intended for the public and the defendant knew it.

[+] UncleMeat|6 years ago|reply

A friend of mine from grad school was very involved in legal issues related to cfaa stuff. According to him, weev really got screwed because he failed "the punk test", which discouraged lawyers from wanting to use him as a test case.

[+] devmunchies|6 years ago|reply

>except that certain entities, like the Google search engine, have express permission from LinkedIn for bot access

how does this work technically? i just tried crawling a friend's profile using curl and set my user agent to Google's bot and it still was blocked.

[+] henryfjordan|6 years ago|reply

hiQ asked the court for a preliminary injunction to stop Linkedin from denying them access, won it, and this is the result of Linkedin's appeal of that injunction. This is not the end of the case.

The title is wrong. The 9th Circuit just ruled that hiQ has a decent enough argument to move forward. The question of whether them scraping a public site can violate the CFAA is not settled.

> We therefore conclude that hiQ has raised a serious question as to whether the reference to access “without authorization” limits the scope of the statutory coverage to computer information for which authorization or access permission, such as password authentication, is generally required

> The data hiQ seeks to access is not owned by LinkedIn and has not been demarcated by LinkedIn as private using such an authorization system. HiQ has therefore raised serious questions about whether LinkedIn may invoke the CFAA to preempt hiQ’s possibly meritorious tortious interference claim.

Note the tone of the language used in the ruling. The judge makes it pretty clear that nothing is final here.

[+] staticautomatic|6 years ago|reply

I think you've mischaracterized the state of things. In the underlying case, LinkedIn asserted that HiQ violated the CFAA and HiQ said LinkedIn tortiously interfered with its business. The trial court said LinkedIn couldn't assert the CFAA. LinkedIn appealed, asking the appellate court to overturn the trial court and also to hold that the tortious interference claim is preempted by the CFAA. The appellate court said no, we agree with the trial court and there's no preemption, so now HiQ can go back to the trial court and proceed to trial with its tortious interference claim.

[+] foota|6 years ago|reply

I disagree about the tone, it seems to suggest to me that the judge believes there is a strong case here for hiQ.

[+] Miner49er|6 years ago|reply

AP seems to be saying differently. https://apnews.com/1e1cacd92df74f48846e8bce5237b97d

[+] docker_up|6 years ago|reply

You misunderstand basic law terminology.

A preliminary injunction is considered very strong. So it's not that "nothing is final here", it's actually almost pretty much final unless something comes out of left field.

[+] akersten|6 years ago|reply

A good decision was reached, but it's a little worrying that the emphasis in the ruling was mostly about a weighing of business interests rather than affirming a right to access public information. If HiQ's business model had not been jeopardized by LinkedIn's business desire to block them, I fear this court could have easily gone the other way. I'd really love to see a ruling that solidifies the right of someone to access publicly available data without fear of repercussions. If this case makes it to SCOTUS, I would hope the ruling is predicated on that rather than business harm.

Key paragraphs from the ruling:

> In short, even if some users retain some privacy interests in their information notwithstanding their decision to make their profiles public, we cannot, on the record before us, conclude that those interests—or more specifically, LinkedIn’s interest in preventing hiQ from scraping those profiles—are significant enough to outweigh hiQ’s interest in continuing its business, which depends on accessing, analyzing, and communicating information derived from public LinkedIn profiles.

> Nor do the other harms asserted by LinkedIn tip the balance of harms with regard to preliminary relief. LinkedIn invokes an interest in preventing “free riders” from using profiles posted on its platform. But LinkedIn has no protected property interest in the data contributed by its users, as the users retain ownership over their profiles. And as to the publicly available profiles, the users quite evidently intend them to be accessed by others, including for commercial purposes—for example, by employers seeking to hire individuals with certain credentials. Of course, LinkedIn could satisfy its “free rider” concern by eliminating the public access option, albeit at a cost to the preferences of many users and, possibly, to its own bottom line.

[+] victor9000|6 years ago|reply

> If HiQ's business model had not been jeopardized

I think this is more about validating hiQ's legal standing in the case.

[+] supernova87a|6 years ago|reply

This case is so ridiculous on multiple fronts that although this procedural ruling (injunction) seems technically correct (to allow the case to proceed to actual court), it could just as well have been thrown out with no difference in or ultimate harm to the parties.

First, LinkedIn makes the claim that its users have a right to privacy against scraping by such a 3rd party. That's laughable. As the court saw, their whole business model is made on people sharing their profiles broadly and mostly to the public.

Secondly, HiQ claims that LinkedIn's efforts to stop it from using the data are tortious interference. That's bold -- suppose someone is taking your assets (you believe illegally) and selling them to others -- can you imagine the gall that the person taking your assets can sue you for interfering with their subsequent sale of your assets?

Finally, that LinkedIn resorted to using the computer fraud and anti-terrorism statutes to make their argument is ridiculous.

So much craziness to go around. I would've just tossed the case, but I guess there is the whole bit about due process... Maybe HiQ will fail anyway at the next substantive trial, but what a waste of time.

[+] judge2020|6 years ago|reply

> suppose someone is taking your assets

Except that, in the digital sense, it's only copied. They now have it, but you didn't lose your assets or money besides the <$0.001 it costs to serve each web page.

> So much craziness to go around.

I agree - I haven't read through the entire thing, but it looks like, instead of saying "you can't scrape", they could implicity give a license to users for personal and business use, but not be allowed the reselling of the data (of course carefully worded to allow the likes of Recruiters and whatnot to do so). It's like trying to argue that the DMCA says you can't create a torrent file of some movie.

[+] dx87|6 years ago|reply

Would that ruling mean that sites could no longer refuse to show content based on how they're accessed? For example, sites that won't load if the browser is in headless mode, or sites that depend on javascript as a way of blocking wget/curl.

[+] hartator|6 years ago|reply

That’s awesome news. Thanks also to the EFF for all the work they are doing to ensure fair use is still a thing. We’ll (https://serpapi.com) be donating next year.

[+] codedokode|6 years ago|reply

This is actually bad, would not it be better if sites would be allowed to block crawlers? I don't see what is the legal basis for forbidding to ban scrapers. Is there a law that a site must serve pages for anyone?

[+] 3xblah|6 years ago|reply

There is no legal basis for "forbidding to ban scrapers".

The question is whether there is any legal basis for banning scrapers, i.e., for blocking hiQ. In other words, if hiQ keeps scraping, are they violating anyone's rights and/or breaking the law by doing that?

As long as that remains a legitimate, open question, then hiQ can argue they should be allowed to keep scraping without incurring civil or criminal liability. That is the purpose of the injunction. There could be no legal basis for blocking hiQ. Until that question is resolved, hiQ can keep on scraping.

[+] CamperBob2|6 years ago|reply

Parent was downvoted but I think they have a point. This sounds overbroad, to an extent that I'd worry will get the whole ruling tossed out by SCOTUS.

On the other hand, if the ruling stands, it sounds like it will finally be possible to do useful things with Craigslist.

[+] cooljacob204|6 years ago|reply

Because they are exposing their website to the general public. If you don't want the public to have access and be able to scrap it don't make it public facing.

[+] MiroF|6 years ago|reply

Yes - for instance if your e-commerce site banned people from visiting based on whether their zip code made it more likely they were of a certain racial group, you would be running afoul of the law.

[+] xxxpupugo|6 years ago|reply

You can redefine what is 'public', or require login to see those information anyway.

If anything, the ruling could just push websites to hide information even deeper.

[+] anortef|6 years ago|reply

Why? if your data is publicly accessible what is the difference between using scripts and someone hiring a ton of people on a third world country to copy&paste your content?

[+] meowface|6 years ago|reply

Considering the kind of private scraping and selling tactics LinkedIn has been chronically guilty of (and not just the ordinary "growth hack" stuff: "LinkedIn violated data protection by using 18M email addresses of non-members to buy targeted ads on Facebook" [1]), it's satisfying to see LinkedIn lose this.

[1] https://techcrunch.com/2018/11/24/linkedin-ireland-data-prot...

[+] cookie_monsta|6 years ago|reply

So the champion of the public internet turns out to be a company that scrapes your social media, MLs it and sells the results to your HR dept?

I'm reminded of Dave Chapelle's Halle Berry routine...

[+] sebastianconcpt|6 years ago|reply

In short, even if some users retain some privacy interests in their information notwithstanding their decision to make their profiles public, we cannot, on the record before us, conclude that those interests—or more specifically, LinkedIn’s interest in preventing hiQ from scraping those profiles—are significant enough to outweigh hiQ’s interest in continuing its business, which depends on accessing, analyzing, and communicating information derived from public LinkedIn profiles.

Reasonable. If a platform helps you make information of an individual public, then why it should matter for the platform how the market uses that public information?

[+] victor9000|6 years ago|reply

On how this case relates to the CFAA:

We therefore conclude that hiQ has raised a serious question as to whether the reference to access “without authorization” limits the scope of the statutory coverage to computer information for which authorization or access permission, such as password authentication, is generally required. Put differently, the CFAA contemplates the existence of three kinds of computer information: (1) information for which access is open to the general public and permission is not required, (2) information for which authorization is required and has been given, and (3) information for which authorization is required but has not been given (or, in the case of the prohibition on exceeding authorized access, has not been given for the part of the system accessed).

Public LinkedIn profiles, available to anyone with an Internet connection, fall into the first category. With regard to such information, the “breaking and entering” analogue invoked so frequently during congressional consideration has no application, and the concept of “without authorization” is inapt.

[+] Shivetya|6 years ago|reply

Volokh take is an interesting read [1]

I am curious how quickly most pages get put behind authorization. With the wording of this ruling you could pretty much go snap up any blog side (say like medium) and more. I wonder what kind of services would come out of that, having the data in a format it can be more easier parsed/analyzed?

so every ecommerce site is fair game? I assume most are already being scraped but I cannot imagine having to be in an environment where many of your connections are not people

[1] https://reason.com/2019/09/09/scraping-a-public-website-does...

[+] ankurkwv|6 years ago|reply

Hmm - I think a key in the ruling here was that LinkedIn maintains no copyright claim on these pages. Users on LinkedIn retain ownership of their profile data. Compare that to a blog and maybe copyright could come into play? Not a lawyer just thinking out loud...

[+] monksy|6 years ago|reply

RIP Aaron Swartz

[+] CosmicShadow|6 years ago|reply

Even if LinkedIn loses and scrapers can no longer be blocked, they still just switched to putting all profiles behind an authwall, or at least it's very hard to not get an authwall. So could HiQ even carry on if they won anyway?

[+] jolmg|6 years ago|reply

I'm not very familiar with neither LinkedIn nor HiQ, but what would be the problem with logging in before scraping?

[+] kevin_b_er|6 years ago|reply

Best hope that hiQ prevails. The slope slips very fast without LinkedIn's defeat. If LinkedIn prevails, the "EULA" has the force of criminal law and not just an agreement that lacks the meeting of the minds.

[+] sha666sum|6 years ago|reply

This is a weird case, as it turns the question of scraping on its head. Normally you'd think "am I allowed to scrape?", but instead the question becomes "am I allowed to prevent scraping?".

Anyways, I disagree with the court's judgment here. The users have consented that their data be used in accordance with LinkedIn's privacy policy. Even if it is publicly posted does not mean that the user has relinquished control over their personal information for another company to do with as they wish.

[+] dzonga|6 years ago|reply

how about websites using nonsense css-classes usually autogenerated through frameworks that make scrapping difficult ? I'm sure this ruling doesn't cover that case ? well globally I wish authorities would rule that public data should be published in computer accessible format e.g pdf's for humans and xblr's / csv for machines e.g in financial reports. lots of data in pdf's that costs a ton to mine. & tools like AWS Textract are hardly up to task.

[+] prirun|6 years ago|reply

eBay had better lawyers than LinkedIn:

https://casetext.com/case/ebay-v-bidders-edge

I'm glad this court ruled it wasn't a violation of CFAA. But using trespass to prevent it seems reasonable. A private business should be allowed to restrict certain kinds of use of its resources (servers, bandwidth, etc), especially if it is beyond typical use. But if the load is typical and doesn't actually harm LinkedIn, it seems less reasonable to restrict them. If LinkedIn doesn't want automated access to their data because it is too much of a load on their servers, then they should be required to ban ALL automated access, including Google's bots. Of course they want Google's bots because that sends them traffic.

Another reason I think it was stupid for LinkedIn to use CFAA is that it sets them up to be a protected computer system, with protected information. If that is the case, it seems they could be liable for disclosing the information to someone a user didn't want, like a stalked. It's rather dumb: LI is claiming they host protected information, but it is only protected against someone that might compete with them.

[+] silentguy|6 years ago|reply

> If LinkedIn doesn't want automated access to their data because it is too much of a load on their servers, then they should be required to ban ALL automated access, including Google's bots. Of course they want Google's bots because that sends them traffic.

By that logic, I should have access to linked premium features for free. Why should linkedin give more data to people who pay?

275 comments