The Craigslist Lawsuit

[+] tomasien|10 years ago|reply

Greg Kidd, the founder of 3taps, did not have to keep fighting this fight - AT ALL. He is one of the top execs at Ripple Labs, was in the first round of Twitter (and Square), and doesn't have his net OR self worths tied up in 3taps. He continued this because he believes it was right - and I, for one, thank him for it.

[+] briandear|10 years ago|reply

Right for repurposing someone else's data without permission? Nothing right about that.

[+] a-dub|10 years ago|reply

Back when the first bubble burst in late 2001, I scraped a bunch of historical craigslist data from a secondary archive and built an interactive gnuplot webpage of post-traffic by category over time. At the time, it got slashdotted, a couple hundred thousand people looked at it and it was all fun and good.

So I thought afterwards, hey, the economy is kinda sketchy still and looking at this stuff sure is neat... I should build a real tool that robustly and respectfully logs daily post totals for more locales, and maybe build out a cool little graph portal. Maybe I can even do a little NLP to make it smarter. hey, it's craigslist, they're community minded.. they thank me when I post, they won't mind. They give pencils to teachers even.

So I email them, and Craig responds in a cc'd message with a 'hey cool, can this guy use our RSS feeds'? At which point, the assholes that worked there started inventing every excuse under the sun as to why doing so would totally damage their infrastructure (because you know, polling RSS every half an hour is total abuse.)

Anyway, that's when I realized that all the hippie-dippie stuff was just window dressing and that I really truly was dealing with a really special species of asshole.

I put the project down and walked away. The end.

[+] Rapidwire|10 years ago|reply

Hey, can you post some quotes from those at Craigslist who were against you using their data for various graphs?

[+] tptacek|10 years ago|reply

the same statute that led to the demise of Aaron Swartz

For fuck's sake.

CFAA criminal sentencing guidelines may very well have contributed to Swartz's suicide. They incentivized prosecutors to create complex, showy indictments cross-linking multiple felony charges (because exploiting unauthorized access in furtherance of other felonies is an accelerator in the CFAA). CFAA may be broken in several ways.

But CFAA is also the sole federal statute governing unauthorized access. In civil litigation, CFAA is the only statute that provides a civil cause of action relating to unauthorized access to computers of any sort.

People like to write about civil CFAA as if it was some sort of nuclear option. But civil and criminal cases are worlds apart. If you're going to sue someone for misusing your computer systems, or even just violating your terms of use, CFAA is merely the statute that enables that. That has nothing whatsoever to do with overzealous prosecution.

Invoking Aaron Swartz in an argument over who's allowed to show apartment ads where is manipulative and grotesque.

[+] gojomo|10 years ago|reply

As the EFF argues in the linked brief, there shouldn't be any "civil cause of action related to unauthorized access" when the data in question is made publicly available on the internet.

Craigslist was abusing the CFAA with an expansive interpretation – treating unapproved use as if it were the same thing as unauthorized access – similar to that of overzealous federal prosecutors. Craigslist's argument, if embraced by the courts, would make other cases imposing penalties on the reuse of otherwise-public data easier.

The reference is fair to make these points to a mass audience, although a bit macabre.

[+] rawdisk|10 years ago|reply

"... or even just violating your terms of use, CFAA is merely the statute that enables that."

Are you sure? The Court in the 4/29/13 Order says violating terms of use would not be enough to sustain a CFAA claim. See page 6.

It is interesting how the Plaintiff changed the TOU after the "unauthorized access" and how the copyright claims were dismissed early.

The Defendents made a mistake by ignoring the C&D letter - that opened up the potential for CFAA liability. But I'm not sure they made a mistake in believing they could copy and serve the same classifieds. It appears they could if they obtained them through a third party.

[+] x5n1|10 years ago|reply

We really need a non-profit organization that provides a data store with an api for common things like classified listings, sms messages, pictures, likes, etc.

That can help us move away from this sort of chicken and egg problem with user generated data. These companies are basically hogging it because they were able to build the user base.

If we can get the data in a non-profit store with a licensing scheme that basically says you must as a part of using this data add any user-generated data submitted to your website back to this store so other developers can build products on top of it, we could really innovate in classifieds and social networks.

Perhaps something like that can be funded by EFF or related organization... because then we can potentially apply governance to that user generated data which has not been possible with private companies.

The chicken and egg problem can be solved if big non-profit tech and civil rights brands like the ACLU, EFF, Wikipedia, etc. all get behind this and market it.

[+] aleem|10 years ago|reply

> These companies are basically hogging it because they were able to build the user base

I am all for liberating data and letting startups drink out of the firehose but I have some cognitive dissonance from reading this news.

I know that OLX spends tens of millions of dollars in India and nearby regions to solve the marketplace problem: get a critical mass of buyers and sellers to achieve escape velocity and enjoy growth through network effects[1]. So it's not just that these companies "happened" to build these user bases, they spent money and took early gambles.

This could very well spell the beginning of the end for much of Craigslist's real estate listings (followed by other categories inevitably) unless they have some grand plan to overhaul their UI/UX entirely. Kijiji, OLX, Gumtree are also vulnerable. Maybe even Twitter since it has a habit of shutting down startups built around its feeds.

What should one do if they are at the helm of CL or one of these other companies?

    [1]: https://en.wikipedia.org/wiki/Network_effect

[+] striking|10 years ago|reply

> non-profit

Yeah, no. Unless this is done as an institution like Telegram is (it's made by the VK guy, and he's not charging for any part of it) or it's paid for by tax dollars (and then it would only work for some inhabitants of our planet), nothing will happen. Also, that licensing scheme idea is nifty, but it creates a chicken-and-egg problem. I don't think a single company wants to open its silo because of the advantage that it gets, and because they pay money to establish themselves socially (ads and whatnot), so allowing crappier competitors space on your platform makes you suck.

[+] Erwin|10 years ago|reply

I was thinking about some kind of user-owned Database service to ensure the data is forever free. Imagine a discussion board run by any company where you authenticate with your external Bring-Your-Own-Data credentials. Through some SQL-like interface, that site is able to create tables and add data to the database but associating ownership of each piece of data with your identity.

At any time you can revoke your permission for their access to this data, or share it with others. Any modification to that data is version controlled so a hostile site cannot just modify/delete the data you created on it -- you'd still be able to gain access to any old version.

From a developer POV I think the key would be a SQL-like interface with appropriate caching/conflict resolution so you essentially just connect to a locally running proxy. Perhaps the advantage for the developer is some kind of tiered storage (e.g. your 100 GB database of posts is mostly stored in this external database with a 5% hot data cache local).

From a user's POV, you know noone can take your data and hoard it. Transformation of the data from one service to another similar seems like it would be easier compared to hoping someone writes a good API for export/import. E.g. consider if you could write:

      INSERT INTO feedly.reader (SELECT feedname, feed_url FROM google.reader);

to migrate YOUR data from Google Reader to Feedly -- not just in a 3-month sunset period while Reader shuts down, but forever and ever.

If you want to be paranoid, the database could be federated, so rather than it being central, multiple providers can complete for your data.

All this could certainly make compliance with the European data protection act easier.

[+] blatherard|10 years ago|reply

At least in the US, I think satisfying the 'purposes' requirement for non-profit status would be difficult. Just providing a free service isn't enough. Here's the IRS brief description:

"The exempt purposes set forth in section 501(c)(3) are charitable, religious, educational, scientific, literary, testing for public safety, fostering national or international amateur sports competition, and preventing cruelty to children or animals.

source: http://www.irs.gov/Charities-&-Non-Profits/Charitable-Organi...

UPDATE: I confused non-profit and charitable organizations. Disregard.

[+] BallinBige|10 years ago|reply

sez it all: 'because they were able to build the user base....'

[+] TTPrograms|10 years ago|reply

Put it on the blockchain! </half sarcasm>

[+] brownbat|10 years ago|reply

Excellent update on one of the hard cases EFF has been fighting.

There's a link to an interesting law review article on how the CFAA can make it a criminal act for arbitrarily banned users to even browse to a public webpage: http://digitalcommons.law.umaryland.edu/cgi/viewcontent.cgi?...

It's an absurd result and frustratingly unaddressed by the courts.

[+] jxm262|10 years ago|reply

> and will make its API source code, the settlement agreement, and other legal filings and public policy resources available.

This is interesging to me. A couple years ago being young and naive i received a cease and desist order from craigslist legal team demanding i remove my craigslist scraper from github. It was largely a toy project to play around with an html parser library i wanted to learn anx thought it could be useful. Of course I now understand it was against their tos and from an ethical standpoint, avoid scraping anything unless getting permission, but at the time I was terrified I'd be sued for a ton of money. It felt incredibly aggressive to go after me , a student at the time.

So I'm curious.. is it illegal to scrape but ok to release the source code? Where is the line drawn?

[+] chmike|10 years ago|reply

That's why I would be tempted to create a craigslist competitor with really free access to the data.

[+] tsycho|10 years ago|reply

I don't understand.

>> The Court has ruled that users—not craigslist—own the copyrights in their postings.

>> ... Craigslist finally conceded in Court that no such harm or impairment ever occurred.

>> Craigslist completely rewrote its Terms of Use, removing many of the most abusive clauses.

Everything above seems to be against Craigslist. Then why does 3taps have to agree to a settlement to pay Craigslist $1 million?

And if there are other parts of the court ruling that went against 3taps which this blog post doesn't mention, then how can Craigslist be forced to forward that money to EFF?

[+] mjn|10 years ago|reply

Wikipedia's summary of the case doesn't make it sound as positive for 3taps, especially as the status of the case (their motion to dismiss the case was denied) is not what they wanted: https://en.wikipedia.org/wiki/Craigslist_Inc._v._3Taps_Inc.#...

Two significant problems for them: the Court sided with Craigslist's view that 3taps knew its authorization to access the website was revoked when it received Craigslist's cease-and-desist letter, so scraping past that date might constitute unauthorized access; and Craigslist's change to its ToS on July 16, 2012 to claim copyright on posts was valid, so reuse of their material after that date could constitute a copyright violation subject to statutory damages.

[+] melvinram|10 years ago|reply

It's not clear why they are shutting down if "the Court has ruled that users—not craigslist—own the copyrights in their postings."

Maybe "3taps lacks the resources to continue the fight" implies that the lawsuit has drained their bank accounts and they are out of money.

[+] noir_lord|10 years ago|reply

It does say that 3taps has to pay $1m to craigslist that might have killed their onhand cash.

[+] unknown|10 years ago|reply

[deleted]

[+] thinkcomp|10 years ago|reply

The actual lawsuit docket is here:

http://www.plainsite.org/dockets/k5ulex5l/california-norther...

[+] guelo|10 years ago|reply

Wow that's quite a spin on the fact that they lost the lawsuit and had to fork over $1 million.

[+] Buge|10 years ago|reply

So 3taps has to pay craigslist $1M, and craigslist then has to pay that to the EFF. That seems pretty odd.

[+] fencepost|10 years ago|reply

3taps may not have been able to keep fighting it, but they were able to keep Craigslist from profiting by driving them under. I wouldn't be surprised if there was some element of "agree to this or we spend it all making you pay your lawyers then fold the company with all assets completely exhausted - we'll even sell the name and spend that against you."

I believe the term is pyrrhic victory.

And there may also be a (future) kneecapping element to it with the release of their scraping source code.

[+] sschueller|10 years ago|reply

And Craig Newmark tweets that he is donating $1M to the EFF and makes it sound like it's coming from his funds[1].

[1] https://imgur.com/lKE1Ak4

[2] https://www.techdirt.com/articles/20150701/14150431519/no-cr...

[+] noir_lord|10 years ago|reply

Seems like both sides violated, perhaps the judge had to rule against both but didn't really want to award either side.

EFF wins though so that's nice.

[+] pdabbadabba|10 years ago|reply

It's interesting to ponder whether/how EFF could have ethically negotiated such a settlement on 3taps's behalf.

[+] fadzlan|10 years ago|reply

Not sure if I understand this correctly. Does this mean that if Instagram or Twitter terms does not allow scraping of their user's generated content, any developer can just go ahead and do it because the copyright holder of the post is the user?

Since the site does not hold the copyright (and rightly so), the site owner does not have the rights to say what can be done with the data. That belongs to the users that generates it.

In that case, how do we know if all the individual users consent to the scraping? If you scrape 10,000 data, and one user complains, would you be in trouble? And does the user has the right to know who are accessing the data outside of the normal use (since if they don't know, they can't object)?

[+] jasimq|10 years ago|reply

"3taps replied that it did not access craigslist and instead obtained the data from Google" What does this mean? how do they get that data from Google?

[+] gsharma|10 years ago|reply

Probably scraped Google's cache pages, so they would never touch Craigslist servers.

[+] jister|10 years ago|reply

Company A made a chocolate fountain for the "public" to see. People enjoyed it. Company B thought this is a great opportunity to make cakes out of it. Because the fountain is "public" they made this as the source of their cake business. Company A complained to take down the fountain and....

Well you know the rest of the story. :)

[+] hayksaakian|10 years ago|reply

If only the chocolate fountain was infinitely copy-able :)

[+] mtw|10 years ago|reply

interesting outcome. Do users own the copyright to their pictures and postings on Facebook? twitter?

Can I build a Facebook scrapper and redistribute it to other sites?

[+] tedunangst|10 years ago|reply

You can read the various terms of service and user agreements to find out.

[+] eli|10 years ago|reply

Users almost always retain copyright to content they post on Facebook, twitter, etc

[+] j_lev|10 years ago|reply

Would a "cannot use for commercial purposes" clause have nipped this one in the bud? I'm still on the fence with this one. I think Craigslist could have played it a lot better but I find it hard to believe no-one here can empathise with the founder.

[+] shawnee_|10 years ago|reply

3taps built a data exchange that aggregated user-generated data housed on various websites and then made that data available through an API to developers, including PadMapper and Lovely.

Craigslist discovered that it had become (has become) the "MLS" of rentals... and perhaps even more accurately -- it's a brokerage of _housing_ data -- both rentals and sales. So when property management companies (PMCs) discovered how darn easy it was, for example, to flood craigslist with multiple ads for the same unit, or to flood it with units that were never available to begin and thus alter market perception -- certain people got exactly what they wanted: hyperinflation in rents, or the subsequent upward pressure on housing prices, or both.

As recently as 2010, craigslist welcomed innovative uses of the publicly available data ... Over the next two years, as innovators like PadMapper and AirBnB began to thrive, craigslist reversed course, and punished the innovators it previously welcomed to use the data. In February 2012, craigslist rewrote its Terms of Use, abandoning its long-articulated position that users own their own content which was freely available on the “public” part of craigslist's website.

As outraged as everybody was about this, it is exactly what the real MLS does when you decide to sell your house. You sign a contract promising to pay some Realtor's brokerage company 6 percent of whatever your house goes for -- in that contract you are essentially giving them the "copyright" of your house listing; they own it on the MLS and that is why you have to pay them the big bucks. Never mind that they do basically NOTHING other than simple photography and data entry to post on the MLS... but now they require you give them ~$66K of your equity for their 3 hours of work. (Source: http://www.mercurynews.com/business/ci_28512250/report-silic... Median price of "entry level" home in San Mateo County = $1.1M).

Same thing is happening in rentals / property management co's (PMCs), but slightly different symptoms.

Nobody is attacking the problem the right way, though. 42Floors tried the experiment and found it to be a failure, too. (Source: https://news.ycombinator.com/item?id=9881213)

The market should be putting more pressure on brokers to compete with each other ... damn that 6 percent. (Right, but the NAR signed a non-compete agreement with itself so it gets to do that)

Hackers should stop building tools that make it easier and cheaper for the PMCs and real estate agents to steal everybody's equity.

[+] jacquesm|10 years ago|reply

> You sign a contract promising to pay some Realtor's brokerage company 6 percent of whatever your house goes for

For comparison: NL is roughly at 1,85 (negotiable).

[+] briandear|10 years ago|reply

People were 'outraged'? Really? The average person couldn't have cared less. There's nothing stopping anyone from building a new CL, marketing it and then getting people to post on it. The fact that there's a chicken egg problem is irrelevant; that's the same problem faced by every social startup, yet the successful ones manage to overcome it.

[+] thomasrossi|10 years ago|reply

Do you think the sentence has impact on other scraping scenarios, say scraping for travel data for instance

[+] trhway|10 years ago|reply

why they took on CL instead of, say, FB? Or they think it is better to start with an easy/smaller guy and ramp it up to the bigger fish?

122 comments