top | item 21241395

LinkedIn loses appeal over access to user profiles

574 points| isalmon | 6 years ago |reuters.com

163 comments

order

pixelmonkey|6 years ago

The summary here is that LinkedIn tried to argue that it could prevent scraping of public LinkedIn profile data under their ToS, but the courts have ruled that if data is public and provided by users, it can be scraped/crawled, that is, it isn’t LinkedIn property. This is generally a positive outcome for people/companies turning web text and HTML into structured data, e.g. tools like Puppeteer and Scrapy can be used more freely on sites like LinkedIn, Twitter, and Reddit. Now, you might still get into trouble if you re-publish that data, but you can, at least, safely use the data ”internally”, and the act of scraping/crawling (politely) is not, per se, something unlawful.

eagsalazar2|6 years ago

Not sure "isn't LinkedIn property" is accurate here. They still retain ownership and control of redistribution just like any other IP. This is more of a philosophical question about whether "viewing" itself is a violation of their ownership rights and really about the definitions of "viewing" and "public" in the context of the internet.

Seems like they've simply determined that viewing any freely accessible URL is "public" and that "viewing" does include scraping. This seems like a very reasonable determination as it maps pretty neatly to how we think about viewing public content IRL where I am free to drive down the road (for profit or pleasure) and record publicly viewable signage and activities and use that data any way I see fit.

giancarlostoro|6 years ago

I think that's fine, but I also think the end-user should decide. With Google (edit: I meant Facebook) I'm able to determine whether or not I want to show up in search results. This shouldn't be an absolute is or isn't public situation.

lucb1e|6 years ago

This is about the copyright on the items that people post, i.e. creative works, right? But what if LinkedIn collects facts (where you work, your age, etc.), wouldn't that be covered by sui generis property right (better known as database copyright)?

Does this judgement say anything about that, i.e. whether it matters that users contributed the facts in their collection (so I'm not talking about posts, descriptions, etc.) rather than that they collected it themselves and therefore get a form of property right?

Edit: wait, database copyright is not a thing in the USA. Of course they wouldn't say anything about that.

tempestn|6 years ago

My understanding is that the contract (TOS) portion is not decided. This decision stated that Linkedin does not have a protected property interest in the profiles, so it can not claim copyright there. It's possible they could claim things like compilation copyright; that's is as yet undecided. Also, the appears court only dealt with the CFAA issue I believe; there's still the contract (TOS) to consider, as well as a possible trespass claim.

Now, the CFAA was the only criminal statue involved, so I guess that supports what you said, that scraping is not unlawful. There still may be liability though, and using the data only internally would not necessarily protect from that. It remains to be seen.

perl4ever|6 years ago

"it can be scraped/crawled, that is, it isn’t LinkedIn property"

I thought it was pretty established that putting something on a website didn't eliminate your copyright. Has that changed now?

To me, it seems like common sense would be that if you make a public website, you are implicitly permitting some copies, but surely it's not all or nothing?

jammygit|6 years ago

My understanding is that Facebook uses similar clauses to disallow web scraping. Does that mean Facebook is fair game too?

andy_ppp|6 years ago

I'm pretty sure you would get a big GDPR fine if you start taking data people agreed to put on Linked-in without their express permission.

echelon|6 years ago

This is fantastic. I would like to see wider legislation allowing scraping of IMDB, Genius, Reddit, Facebook, and Google made legal. These services receive free input from users. The data should remain free.

Edit (sort of off topic): There's still value in the building and providing services at scale, but this lowers the barrier to cross the moat for small players. The first step is data liberation. Then we can work to bring down the other cost barriers. It's a lot easier to build services that scale in 2019 than it was in 2005.

The semantic web was misguided in 200X, but we might want to take another swing at it in the future.

psv1|6 years ago

Another side of this is that the entity doing the scraping is more often than not another company. Which means that if your proposal is implemented, a user can voluntarily give their personal data to Google/Reddit/Facebook etc but that company then has to make the user's personal data available to another company.

tekknik|6 years ago

When you talk about data here these are people. This HiQ software is actually a bit scary. What if it gives a false signal which ends in an employees termination? Data on people should not be freely attainable, the person should give explicit access. If I don’t want HiQ processing my information (I don’t) then they shouldn’t be able to. Especially now with some employers requiring a LinkedIn profile.

sireat|6 years ago

Reddit has a decent API

The golden rule is to use the API before you start raw scraping.

jakeogh|6 years ago

It's already legal. Adding law adds restrictions.

lonelappde|6 years ago

What gives you a rightful claim to information that I gave to someone else, if neither I nor they consent?

perspective1|6 years ago

I'm torn. On the one hand, scraping helps break down walled gardens. On the other, we're talking about personal details being used in novel ways that no LinkedIn user probably understands. I doubt any LinkedIn user writes their profile expecting HiQ to scrape it, assign a "flight risk" score and alert your bosses.

heavyset_go|6 years ago

User privacy shouldn't be dependent on draconian anti-scraping laws.

Besides, LinkedIn is already sharing every last bit of their users' information with the highest bidder.

landryraccoon|6 years ago

To me this is conceptually the same problem as DRM - with your position similar to those trying to build DRM systems.

One can’t both hand over data freely to a service (in this case Linkedin) and also subsequently prevent all sharing of that data. Or to put it another way, you can’t both put your information on a public billboard hoping a recruiter sees it to offer you a job AND keep it strangers private from people you hope won’t misuse it.

Nextgrid|6 years ago

The users agreed to publish their details publicly on LinkedIn. It’s normal that anyone can access those details and use them however they like.

post_below|6 years ago

I'd personally call HiQ's business model bottom feeding.

However restricting access to public information on the internet will benefit only the established titans. So this ruling is great news.

zlagen|6 years ago

Users know their information is public and they have the option to make it private on Linkedin. If Linkedin is worried about the privacy of their users they should let them know about the risks of having a public profile.

spaced-out|6 years ago

Same here. On one hand, this lessens the monopoly power of large tech companies, on the other hand, it gives users less control over their data.

paxys|6 years ago

IMO if you set up a profile on LinkedIn there's a pretty clear expectation that your bosses will be able to see it.

pjc50|6 years ago

Doing so in Europe is a clear GDPR violation.

I think that's a reasonable balance - you can scrape data, but not personal data without consent of the scraped person.

undefined3840|6 years ago

I recently learned from a recruiter that one license for one recruiter for LinkedIn is $10k a year, so that is what they are protecting.

phs318u|6 years ago

I’m a very active user of LinkedIn, effectively cultivating my “professional brand” on it. I’ve been contracting for years and use my network to find gigs. While I don’t have an issue with the business that HiQ are in (informing businesses of employee flight risk), I do believe there’s a qualitative difference between data that I publish for consumption by human eyeballs for free (a use of my data that I’ve authorised), and someone harvesting such data and en-mass for commercial purposes that I have not authorised. HiQ have not asked for my permission to use my data, they have not made any commitments about how they will use and not use my data. Given that they have access to my contact details (even via LI itself), they are capable of contacting me to request permission to use my data.

paxys|6 years ago

The difference isn't as clear as you are making it out to be.

If you have a public LinkedIn profile, should an employer be able to look at it without your explicit consent and reach out to you for job opportunities (or disqualify you from one)?

Should the employer be able to pay someone else (say a recruiting agency) to look at LinkedIn profiles on their behalf?

Should the recruiting agency be able to use automated tools (which scrape public profiles) that make things easier for them?

CosmicShadow|6 years ago

What HiQ did was scrape public data, so if you have your LI profile set to public, then anyone can access it and do what they will with it, just like if you posted a print out of it on a bulletin board in a mall. It's in the open and is free game for whtever. You can make your entire profile or just aspects of it private, meaning people need to login to LI to see your stuff, which then protects you under the TOS.

I think profiles were default public so you could be found on Google and for SEO purposes for both you and LI.

You'd be hard pressed to find a public profile accessible anymore on LI anyway, even with public settings, you'll hit an authwall 9 out of 10 times.

danielrhodes|6 years ago

LinkedIn has played a very poor strategy here. The value of the service should be in the network, which is quite defensible. Instead, they’ve made the value in the profiles, which is not defensible. Few people curate their network on LinkedIn because you can't see profiles unless you are closely connected, so you are incentivized to add as many people as possible, thus devaluing the entire network. Then they go and sell unlimited access to profiles to recruiters and sales people. Thus, when other services come around and scrape their data, which LinkedIn needs to make somewhat publicly available for SEO juice, it becomes an existential threat.

If you look at Facebook, there is some limited profile data publicly available, but they will go to the wall to prevent people from seeing how those people are connected. In addition, they started from a very walled-off position, so they didn't become reliant on SEO traffic.

crazygringo|6 years ago

Question:

This seems to mean LinkedIn can't sue to prevent scraping.

I assume it's still legal for them to implement technological anti-scraping measures? So the two companies can play cat-and-mouse if they wish with rate-limiting, IP addresses, etc...

thomascgalvin|6 years ago

An earlier ruling actually ordered LinkedIn to stop attempting to block the scraping using technological measures, too.

r_singh|6 years ago

Not too hard to surpass those with things like residential proxies, randomised user agents, headless browsers, etc. Bring on the anti scraping measures...

55555|6 years ago

> This seems to mean LinkedIn can't sue to prevent scraping.

Zillow and similar companies have shut down numerous startups which relied on scraping their data.

How is this different?

lr4444lr|6 years ago

What cracks me up about this is how these massive companies go to such lengths to call themselves mere platforms in order to avoid liability for content, and then when someone actually takes the content in this case they cry, "Foul! That's ours!" Can't have it both ways.

genidoi|6 years ago

Linkedin tried to argue that if they put data behind a login wall, then it no longer falls under the wide umbrella of "public data" and so it's "theirs". Previous cases already established that if a crawler can see the data without any session cookies then its okay. This ruling extended that to any data that can reasonably be accessed by any member of the public.

There will probably be more cases like this as the upper bound of what "public data" means; At what point does publicly aggregated data stop being public data? And do attempts that companies make to prevent that data from being captured (ip limiting, captchas, login walls) count as immoral/illegal, since they are restricting the public from accessing a public good?

giancarlostoro|6 years ago

What's scarier is when they editorialize their platforms (also read censorship), therefore becoming content producers themselves. Today it's whoever you disagree with being censored, tomorrow it's your own voice.

Domenic_S|6 years ago

Who's responsible for privacy then? That's another situation where you can't have it both ways - can't tell the platform they don't own the data and simultaneously hold them to GDPR.

playing_colours|6 years ago

I do not like a hide and seek game with who viewed your profile functionality: upgrade to a paid subscription to see who viewed, upgrade to another tier to hide that you looked at someone.

It looks like the lack of imagination or business prowess to come up with more advanced, valuable, and less annoying ways for monetisation. If only they could make it easier to connect people with matching mutual interests, more flexible than plain traditional job board and the database of CVs.

gnicholas|6 years ago

You don’t have to pay to hide that you viewed someone’s profile. Maybe if you want to see who viewed yours, but also keep your browsing private — but it seems more reasonable to charge for that sort of functionality.

xupybd|6 years ago

After finding this https://github.com/Greenwolf/social_mapper, I strongly recommend against having a profile photo on linkedin. It has caused me to be far more careful about my presence on the internet.

In the post privacy age I don't want my personal opinions to come back and haunt me. I grow as a person but the internet remembers all. If I make a dumb mistake and it's published online that's not a problem for me in 10 years if that fades away. But people are collecting and correlating info now. I don't like it one bit. It means someone you've never met, in a country you've never been to could extort you. It's getting very scary.

vesche|6 years ago

You can make it so your picture on LinkedIn is only viewable by people who are connected with you. I do agree that people should be cautious about what they post/share online however.

gist|6 years ago

I think also what most people don't realize is that linkedin's current model makes it difficult to access someone's profile without them knowing (if they pay for it and have the option on their account) to see who is looking at their profile. As such the user wanting to look at a person's profile has no privacy that they have done so. There could be many reasons someone looks at someone else's profile (even just some kind of curiosity or mistake) so this to me is an issue in itself.

Sure there are ways around this (you can make up a fake profile and some info is public but normally what I run into is a request to login to linkedin to view something that I am interested in).

scarface74|6 years ago

There is a setting that lets you see other people’s profile without them being notified. You can do it with free accounts. But you also can’t see who viewed your profile.

If you pay, you can keep your viewing private while seeing other people’s profile.

ChrisMarshallNY|6 years ago

Personally, this doesn't bother me too much. I use LinkedIn specifically because it is public. I'm an "open kimono" type of person. Not particularly interested in hiding stuff.

However, the general principle of "Data Scraping as a Business Model" bothers me. This is by no means the only company that does it (I suspect that MS does it with their access to LinkedIn).

There are far more egregious instances, and many of them have ways to get users to voluntarily cede information (can you think of a rather obvious example?).

LinkedIn is a sandwich board. It's meant to be a public showcase. If you want private, I suspect there are much more focused (and probably valuable) venues that cater to particular communities.

alexandercrohde|6 years ago

> Not particularly interested in hiding stuff.

Well, so the company, HiQ, is basically scraping every time you update your linked in, to tell your employer you might be about to leave.

Now maybe that's cool with you. But it seems super sketchy to me, and one reason I deleted my linkedIn altogether.

hooloovoo_zoo|6 years ago

What if LinkedIn adds a visibility option in addition to public/private profile that says "I want LinkedIn to prevent robots from scraping my profile."? What if LinkedIn enables that mode by default? Can they then continue preventing scrapers?

alt_f4|6 years ago

I think they can, but they won't because robots includes search engines and blacklisting search engines from user profiles will very negatively impact their metrics.

conjectures|6 years ago

IP aside, anyone else concerned about the business of HiQ?

I presume what they are doing is:

* Scrape profiles.

* Calculate time delta in jobs.

* 'Predict' churn rate for (prospective) employee.

With respect to prospective employees in particular this seems likely to entail lots of risks. Average job time delta is going to be a massively overdetermined variable, and noisy wrt 'next job delta'. I'm worried how they're going to sell that to employers.

spider-mario|6 years ago

> “And as to the publicly available profiles, the users quite evidently intend them to be accessed by others”

How is it evident that the users intend them to be accessed by scrapers and not just humans? Since the ToS forbid scraping, it seems very reasonable to me to imagine users making their profiles public because of that assumption that scraping is not tolerated.

alkonaut|6 years ago

What is the limit for what is "user provided"? My entire facebook profile, including my social graph is "user provided".

Does this mean that it would likely be possible for a competing network to have a "click here to import your friend list" for example?

brushfoot|6 years ago

This is great news. The data is public; it shouldn't matter whether you hire humans to parse it or develop a bot. LinkedIn was trying to have its cake and eat it too.

Causality1|6 years ago

Would it really be that difficult for LinkedIn to requires users to be logged in before viewing profiles and include anti-automation rules in the EULA?

donohoe|6 years ago

In case its not clear, this is from September.

mherdeg|6 years ago

Hmm, how does this compare versus the Craigslist/3Taps/Radpad litigation? Are these similar issues?

EGreg|6 years ago

It sounded like this was going to be an opinion piece about how LinkedIn is losing its appeal to users.

atombender|6 years ago

Anyone versed in U.S. law who can comment on whether the judgement in this case sets a precedent?

gnicholas|6 years ago

Yes, in the 9th Circuit (western US) this is binding precedent. Elsewhere it can be cited but is not binding.

mminer237|6 years ago

Technically not. This was just a preliminary injunction. The case itself still has to be decided. But assuming this was indicative of how the court will rule, it will then be binding precedent in the Ninth Circuit.

Barrin92|6 years ago

As expected a lot of people here talking about public data and whatnot, but that is a horrible decision.

"Circuit Judge Marsha Berzon said hiQ, which makes software to help employers determine whether employees will stay or quit, showed it faced irreparable harm absent an injunction because it might go out of business without access.[...]

“LinkedIn has no protected property interest in the data contributed by its users, as the users retain ownership over their profiles,” Berzon wrote. “And as to the publicly available profiles, the users quite evidently intend them to be accessed by others,” including prospective employers."

This isn't some sort of empowerment of the public, it's surveillance capitalism. No end-user in their right mind publishes data on LinkedIn with the expectation that the information is bought up by a third party, analysed, and then sold back to your employer in a way that exposes your personal intent and may even threaten your job. The only thing this accomplishes is enabling shady business models that feed of a sort of internet voyeurism, and at the end of the day it'll lead to people turning their profiles private and making LinkedIn more difficult to use if you're someone who is looking for information in good faith.

themacguffinman|6 years ago

> No end-user in their right mind publishes data on LinkedIn with the expectation that the information is bought up by a third party, analysed, and then sold back to your employer in a way that exposes your personal intent and may even threaten your job.

Yes they do. Do you think people who are afraid of their employer finding out about something would show it on their public LinkedIn profile in the first place? If a manager or colleague who they've likely already "connected" with simply opens your LinkedIn profile in their web browser and sees the same info that hiQ sees, then it's game over. If you don't want your employer to know, don't publish it on your public profile. It's absurd to suggest that some minimal manual effort to load a few profiles is a serious privacy defense.

jakeogh|6 years ago

Your argument is to let corporations effectively make law.

alkonaut|6 years ago

> No end-user in their right mind publishes data on LinkedIn with the expectation that the information is bought up by a third party, analysed, and then sold back to your employer in a way that exposes your personal intent and may even threaten your job.

Doesn't that get covered by laws such as GDPR (where applicable)? Just because I can scape your profile doesn't mean I can publish it, sell it etc (or even keep it). I can do it with your consent, and LinkedIn can't complain, isn't that it?

onetimemanytime|6 years ago

>>that required LinkedIn, a Microsoft Corp unit with more than 645 million members, to give hiQ Labs Inc access to publicly available member profiles.

Not sure this is a win for the web. Sure it's user submitted but the users agreed that Linked in owns that after they submit.

rgross1|6 years ago

Are there any useful bots for scraping LI profile out there?

buboard|6 years ago

OK how does is that going to work for Facebook?

NKosmatos|6 years ago

This whole situation with public data, personal information, data scrapping, GDPR and us putting our own info on various sites displaying them publicly and then complaining if someone collects them and uses them, has gotten out of hand :-( I think I’ll have to side with hiQ on this.

pkilgore|6 years ago

> September 9, 2019 / 1:34 PM / a month ago