Introducing the 4chan API

[+] afhof|13 years ago|reply

I feel kinda silly just having spent the long weekend writing a threaded 4chan scraper. This is a super welcome change though. Even if you don't visit 4chan regularly you can't ignore the VAST amount of content people upload there. I imagine some interesting statistics will come of this ( I know I plan to).

[+] cs702|13 years ago|reply

Even if you don't visit 4chan regularly you can't ignore the VAST amount of content people upload there. I imagine some interesting statistics will come of this.

That was my reaction too. Beyond statistics, this will make it easier to develop all sorts of user-facing and machine-to-machine applications -- for sharing, grouping, ranking, and linking items, and even for 'overlaying' the content on top of other social networks.

I'd expect the API to be grow and mature over time, and am curious to see what comes out of this experiment.

[+] phogster|13 years ago|reply

I don't know. Getting an API to 4chan is like getting a parking pass to the local garbage dump. What statistics are you actually mining here?

[+] calinet6|13 years ago|reply

The first thing that jumps to mind with this is... "Oh $#!*." I don't know why, but it scares me what could come out of this.

[+] 54mf|13 years ago|reply

Came to post this. The only redeemable value in 4chan, in my opinion, is that the fact that posts aren't archived makes for a very interesting social experiment. An API firehose pretty much puts an end to that.

[+] unknown|13 years ago|reply

[deleted]

[+] Jagat|13 years ago|reply

I'm a new graduate student in an American university. As part of my Data Mining/NLP project, I'm wondering if I can do something cool with this fresh API. Any ideas?

[+] nyan_sandwich|13 years ago|reply

create a markov chain 4chan slang generator.

track usages of phrases over time. (thinking of the recent evolution of "rustled my jimmies" derivatives)

See what topics are trending

Fuck maybe I should build some of this...

[+] bootload|13 years ago|reply

"... The decision to release an API was partially out of necessity, but also because I'm curious to see how people will use it. ..."

And who. The API just made a group of intelligence hackers very happy indeed.

[+] xefer|13 years ago|reply

It still requires scrapping to discover the thread ids though does it not?

[+] moot|13 years ago|reply

We'll have indexes and a catalog view soon.

[+] tarice|13 years ago|reply

What I'm taking away from the comments below is:

"Everything that could be done with this API has already been done using HTML parsing. This development will simply make those applications faster."

Truth?

[+] lnanek2|13 years ago|reply

Yeah, and there have been Python scripts anyone interested passes around and shares too, so you haven't even had to write it yourself...

[+] andyzweb|13 years ago|reply

true

[+] ddod|13 years ago|reply

Could someone explain to me how this could be leveraged (or if it could be) to gather a sort of stream of messages, a la the Twitter streaming API or reddit.com/r/all/comments.json?

I'd be interested in doing some language statistics and comparing them to the aforementioned networks.

[+] zevyoura|13 years ago|reply

Elsewhere in the comments here, moot said, "We'll have indexes and a catalog view soon." So for now, you need the thread id.

[+] terhechte|13 years ago|reply

Sadly read-only, though it's not much work parsing the HTML and faking a submit through a Post request. Good luck submitting a 4chan app to Apple's app store though :)

[+] Lockyy|13 years ago|reply

They have already existed. They all got pulled recently.

[+] volaski|13 years ago|reply

Forgive me if this is a noob question, but does 4chan restrict embedding of images.4chan.org images from external urls? I was just playing around with the API and it seems all the images are rendered as the placeholder image that says "4chan.org".

If this is true, I don't know how to utilize this API to make something valuable since all I can do is get the url or text. Somebody please enlighten me. Thanks!

[+] lnanek2|13 years ago|reply

This sort of protection is usually done by checking the referrer header, which is trivial to set when retrieving something programmatically or when using standard tools like wget. The API seems focused on reducing the processing costs of browser extensions that let the user view the page, but add extra features to the page, anyway. Those would probably still seem like a normal browser view of the image to the site by default even if browser plugins can't perform the trivial client sent header change (not sure if the browser plugin API exposes it).

[+] a_bonobo|13 years ago|reply

Why would you hotlink to 4chan-pictures? These get deleted with their thread once the thread hits page 10, anyway, which can happen in under 5 minutes (on the more active boards like /b/)

[+] unknown|13 years ago|reply

[deleted]

[+] astrojams|13 years ago|reply

This could transform 4chan as mobile and desktop clients are created. God I hate the web interface.

[+] unkoman|13 years ago|reply

The mobile adapted web interface is pretty good now.

[+] JD557|13 years ago|reply

There are already mobile clients for android and, IIRC there were clients for iOS but were banned from the app store due to some kind of infringement (I think it was adult content)

So I don't think a lot of stuff is going to change, excluding the diminishing server load that happened with old clients/extensions.

[+] kineticflow|13 years ago|reply

There already were plenty of ("native") mobile and desktop clients, although they worked by parsing HTML.

[+] 3143|13 years ago|reply

Can anyone repost the info for those of us who can't (or prefer not to) visit 4chan at work?

[+] jonny_eh|13 years ago|reply

Every thread will be available as JSON.

[+] evolve2k|13 years ago|reply

Regarding financial sustability, have you thought about charging for the new API?

[+] joethompson|13 years ago|reply

I'm sure this is about trying to improve site performance, and charging for it would inevitably cause everyone to continue scraping the HTML, thus defeating the point.

[+] jasimq|13 years ago|reply

This is interesting. Would surely give it a try and integrate with our app

[+] angersock|13 years ago|reply

Something I just cobbled together:

   curl http://api.4chan.org/b/res/423418552.json | python -mjson.tool
   curl http://api.4chan.org/b/res/423418552.json | json_pp

Example for grabbing a thread and prettyprinting the JSON of it.

Because, you know, we need more 4chan in the house.

(EDIT: brief skimming of the comments indicates it may be semi-offensive, so be warmed. We're skimming /b/, after all.)

79 comments