top | item 8422599

Hacker News API

1714 points| kevin | 11 years ago |blog.ycombinator.com | reply

298 comments

order
[+] christiangenco|11 years ago|reply
Oh man you guys, patio11 has generated massive amounts of content: https://hacker-news.firebaseio.com/v0/user/patio11.json?prin...

I count 8,483 submissions. I'm sure there's something interesting to be done with all of this data. A word frequency chart?

---

Edit: So apparently there's a ruby gem that lets you feed it a body of text and generates pseudo-random phrases based on that text.

I present to you the patio11 impersonator: https://gist.github.com/christiangenco/e8d085e47479be0131e1

One of my favorites:

    A nice set of challenges -- kitty at a school with tens of thousands of bucks a year or less immediately.
---

Also, a word count on patio11's submissions: 1,052,351. For comparison, all 7 Harry Potter books total 1,084,170 words. patio11 has written the entire Harry Potter series worth of content on HN. Just... wow.

[+] patio11|11 years ago|reply
Thanks, I had been curious about that number for a while. The last time I checked it was 500k or so.

For folks who want to do interesting things with the API but don't want to be abusive to Firebase's servers, I whipped up a quick ruby script to cache a particular user's comments/submissions on disk: https://gist.github.com/patio11/1550cad3a02edd175049

It tries to rate limit itself by putting 200ms of sleep between requests, so downloading all of my comments would take ~30 minutes.

"I release this work unto the public domain." -- feel free to adapt it to your needs.

Usage is "ruby slurper.rb $USERNAME $MAX_COMMENTS_TO_FETCH."

[+] bcoates|11 years ago|reply
It checks out, I put in TempleOS and got out Genesis: "We have dreamed a dream, and there is no interpreter of it."
[+] jaredsohn|11 years ago|reply
Here are some line charts of his posting history: http://hnuser.herokuapp.com/user/patio11/

Click on the line chart to do an hnsearch for the time period.

Update: Site should be back up. It crashes occasionally (that's part of why I hadn't post it yet.)

[+] benohear|11 years ago|reply
How about feeding the top 20 into Bingo Card Creator?
[+] x0x0|11 years ago|reply
off the top of my head: highest value word patio11 writes is "more" or "raise"...
[+] airlocksoftware|11 years ago|reply
This... is cool, but also kinda sucks for me. I've invested dozens of hours into writing an extremely complicated scraper for my Android version of HN.

https://play.google.com/store/apps/details?id=com.airlocksof...

The newest version (still under development, probably a month or two from release) adds support for displaying polls, linking to subthreads, and full write support (voting, commenting, submitting, etc). I'm fine with switching to a new API (Square's Retrofit will make it super easy to switch), but without submitting, commenting, and upvote support I have to disable a bunch of features I worked really hard on. Also it would've been cool to know this was coming about 3 months ago so I didn't waste my time.

Anyways, quick question on how it works -- when I query for the list of top stories

https://hacker-news.firebaseio.com/v0/topstories.json?print=...

it just returns a list of ids. Do I have to make a separate request for each story

https://hacker-news.firebaseio.com/v0/item/8863.json?print=p...)

to assemble them into a list for the front page, or am I missing something?

[+] dang|11 years ago|reply
I'm sorry you just invested a lot of time in scraping. I know from experience what a pain that is. We said several times that the API was coming, and I've made it clear to anyone who asked, but there's just no way to reach everybody. All: in the future, please get answers to questions like this by emailing [email protected].

Re write access and logged-in access, if that turns out to be how people want to use the API, that's the direction we'll go. But we think it's important to launch an initial release and develop it based on feedback. There are many other use cases for this data besides building a full-featured client: analyzing history, providing notifications, and so on. It will be fascinating to see what people build!

[+] dionidium|11 years ago|reply
This... is cool, but also kinda sucks for me. I've invested dozens of hours into writing an extremely complicated scraper for my Android version of HN.

This definitely does suck. I feel your pain. But it's also part of the package of scraping websites. You go in knowing that it could break at any time.

[+] kogir|11 years ago|reply
Yes. While with HTTP pipelining you can request them all over a single TCP connection using a single SSL session, you will need to make an HTTP request for each item you want.

If you're on a supported platform, the Firebase SDKs handle all this efficiently and can even provide real-time change notifications.

[+] thegeomaster|11 years ago|reply
I'm also currently writing a scraper[1] for the HN frontpage (for my WIP Hacker News redesign), and while there's a limited Algolia API available, it doesn't do much good if users can't post comments, upvote etc. Same goes for the official one now.

So, @anyone involved with the API project, can you give us an estimate on when will the OAuth-based user-specific API be rolled out? I'm fining with pausing my efforts until then, if it's going to be soon, in order to go a less complex and error-prone path.

[1]: https://github.com/geomaster/hnop/blob/master/backend/src/hn...

[+] sararob|11 years ago|reply
[Firebase Dev Advocate] @airlocksoftware - Yes, you should make separate requests for each story. You can attach a listener to the topstories node (https://www.firebase.com/docs/web/guide/retrieving-data.html...) and when that’s triggered, you can make a request for the data on each story. Using the Firebase SDK, each request will get made using the same connection. I'd recommend using our SDK instead of the REST API so you don't have to worry about managing your own connections and retries.
[+] tudborg|11 years ago|reply
Just wanted to drop a comment on the awesomeness of your app. Hacker News 2 is by far the best Hacker News app, not just on Android, but on all mobile platforms i've tried (so, iOS, Android and Windows Phone) Awesome work you are doing.
[+] TheAlchemist|11 years ago|reply
I did use your app for learning purposes - I studied the code quite a lot when learning Android. Thanks for good job !
[+] deft|11 years ago|reply
Yeah, I like it a lot, but I've put tons of time into my scraper for Reader YC (https://github.com/krruzic/Reader-YC). I support everything but polls currently. This api is nice but my scraper actually supports more... No option to get Show HN, Ask HN or New afaik. Still glad this is out!
[+] jkimmel|11 years ago|reply
just wanted to let you know that I love your application!
[+] cJ0th|11 years ago|reply
I, for one, was just thinking about writing a scraper...

Thanks very much guys!

[+] dimillian|11 years ago|reply
This is a big question for me too. It sounds like you need to fetch every id from the REST API. I need to test the iOS (and you Android) SDK.
[+] hokkos|11 years ago|reply
I remember it was announced a few month ago.
[+] kolev|11 years ago|reply
Nice app! Is login broken though?
[+] piyush_soni|11 years ago|reply
So why, in the first place, would I want another mobile app rather than just opening the fully functional website (which is pretty simple & basic already) on my mobile browser?
[+] Livven|11 years ago|reply
I've been working on a Hacker News client for Windows Phone over the past several weeks and am very close to an initial release, so I feel somewhat ambivalent about this.

On the one hand, of course it's great that HN is finally getting a proper API and also modernizing its markup (which is a mess even if you ignore all the tables – for example, the first paragraph in a comment usually isn't wrapped in <p> tags), but on the other hand this current v0 version is very lacking and impractical for a regular client application.

Since the top stories (limited to 100) and child comments are only available as a list of IDs a client app would have to make a separate HTTP request for every single item, which is obviously not something you'd want to do especially in a mobile environment. Other lists apart from the top stories (new, show, ask, best, active etc.) don't seem to be available at all right now.

Of course this is just the first version, and the documentation promises improvements over time – which I don't doubt at all – but there's no clear indication that the API will be at feature-parity with the current website, even excluding anything that requires authentication, by October 28. So this means that I – and other developers of client apps or unofficial APIs – will probably have to write new scraping code once the new rendering engine (which I assume refers to the website) arrives instead of being able to switch to the new API immediately.

Now I guess I might just be needlessly worried, especially since the blog post explicitly says that the new API "should hopefully making switching your apps fairly painless", but then why not wait until it's actually ready for that before making the announcement? Putting a half-baked API out there a few days/weeks (?) in advance before it's fully fleshed out doesn't seem all that helpful, at least to me.

[+] shill|11 years ago|reply
I wrote a stupid simple wrapper and pushed it to PyPI. My excuse is that I needed to learn how to use setuptools today.

    pip install hackernews-python
Usage:

    >>> from hackernews import HackerNews
    >>> hn = HackerNews()
    >>> hn.top_stories()
    [8422599, 8422087, 8422928, 8422581, 8423825...
    
    >>> hn.user('pg')
    {'delay': 2, 'id': 'pg', 'submitted': [7494555, 7494520, 749411...

    >>> hn.item(7494555)['title'])
    Hacker News API

    >>> hn.max_item()
    8424314

    >>> hn.updates()
    {'items': [8423690, 8424315, 8424299...], 'profiles': ['exampleuser',...]}

https://github.com/abrinsmead/hackernews-python
[+] dstaley|11 years ago|reply
Decided to recreate the Hacker News homepage using Ember and the new API. I was really pleased with how easy it was! https://realtimehackernews.firebaseapp.com/
[+] ryanseys|11 years ago|reply
Very cool. Would be neat if changes were more pronounced when they happen. Will the posts change order in this demo if they change on the homepage?
[+] danyork|11 years ago|reply
Very cool! Congrats on the quick work!

(As I find myself pondering the idea of standing something up like this on an dual-stacked server purely so that I could access HN from my IPv6-only test network... hmmm...)

[+] andrewstuart2|11 years ago|reply
Definitely digging the live-update on the scores. Haven't seen top stories switch places yet, but I'm guessing that happens also.

Good work.

[+] dang|11 years ago|reply
Holy crap, that was fast. Impressive!
[+] dang|11 years ago|reply
To everyone asking about logged-in access and write access: this is just a first release! Where it goes from here will depend, in good iterative fashion, on what people want.
[+] minimaxir|11 years ago|reply
How does this differ from the Algolia HN API in terms of data access? (https://hn.algolia.com/api) I was able to download all HN data recently with ease using that endpoint. Authentication?

EDIT: After looking at the documentation there are two new aspects of the Firebase API not in the Algolia API:

1) Ability to see deleted/dead stories.

2) Endpoint for user data.

Question to kogir/dang: Has the "delay" field (Delay in minutes between a comment's creation and its visibility to other users) always been there?

[+] jamest|11 years ago|reply
[Firebase founder here] This is pretty exciting for us, we're glad kogir, dang, kevin and sctb chose to expose HN's data through Firebase. We're seen quite a few startups (and big companies like Nest) do this, since building, maintaining, and documenting a public API often isn't a easy task.
[+] nacs|11 years ago|reply
How does this API work with Firebase?

Is HN data already in Firebase (as its primary data store) or is content from HN's DB getting 'mirrored/cloned' on-demand to Firebase for the API?

[+] jcampbell1|11 years ago|reply
This makes it really easy to add average karma to the comment section for every user. For instance, you can paste the below into the console, and should add average karma data for each user.

    Array.prototype.forEach.call(document.querySelectorAll('a[href^=user]'),
    	function(v,k) { 
    		var s = document.createElement("script");
    		s.src = '//hacker-news.firebaseio.com/v0/user/' + v.innerHTML + '.json?callback=ud_' + k;   
    		document.head.appendChild(s);
    		window['ud_' + k] = function(user_data){
    			var avg_karma = user_data.karma / user_data.submitted.length;
    			v.innerHTML += ' (' + avg_karma.toFixed(1) + ')';
    		}
    	} 
    );
[+] jaredsohn|11 years ago|reply
Here is something I built with the Algolia API awhile back and just haven't gotten around to cleaning it up to post here.

It lets you download all comments/stories for a user as a JSON or CSV file, breaks down karma between comments and stories, and plots comment/story counts, karma, etc. over time on a line chart (clicking will show you the details via an hnsearch).

Also I built some npm modules so you can get this information via the commandline.

http://hnuser.herokuapp.com/.

Example: http://hnuser.herokuapp.com/user/tptacek/

The Chrome extension hasn't been updated for awhile (it just superimposes a small amount of this information on the user page).

[+] josephwegner|11 years ago|reply
I really appreciate giving a 3 week heads up before moving to a new frontend structure. It's a nice gesture, but I have this horrible feeling that there's only about a 10% chance that my Hacker News app gets updated in time.

I know you can't not iterate because people are scraping, but it does stink. At least this will make everything more future-proof going forward.

However, it may be nice to give a bit more heads up than 3 weeks. I know a lot of apps can take ~2 weeks to get through the review process for iOS.

[+] bennyg|11 years ago|reply
I've been waiting forever for an API from HN, but unfortunately I will not be using it for my app (https://github.com/bennyguitar/News-YC---iPhone).

I've built a library for iOS (https://github.com/bennyguitar/libHN) that handles scraping, commenting, submitting, voting, etc pretty well and allows me to make as few web calls as necessary to use HN. It looks like I'd have to drop functionality and completely change the networking scheme to match this API - something I'm not willing to do yet.

Correct me if I'm wrong here, but to get every comment on a post, I'd have to recursively get each item for each child. Instead, right now, I can make one network request and get all comments for a story. Granted, I have to parse the HTML (which I hate), but it's a much cleaner solution than going through every item, checking the children and then getting those items ad infinitum. Again, I just glanced over the documentation, but that seems untenable to me.

[+] s9w|11 years ago|reply
I welcome the idea, but this barely qualifies as an API. The most useful part is the "current top stories" - but what timeframe exactly? Seems to be over 3 days at least and can't be customized. And even my test parsing of the 100 top stories took a good minute.

And that returns only the ids, nothing else. To get basic information like the score, title or url you have to lookup the ids individually. And even the story items do not contain such basic information as the number of comments. And you can't calculate it yourself since only the top comments are even returned (as ids of course). So you'll have to recursively dig through the comments to get the number.

This is even more curious as there is a very solid Algolia API where you can filter for submission time, story score, number of comments and even return a greater number of results + access page numbers to get even more.

To get the information of a single algolia api call you will need hundreds or thousands (in case of nested comments) "official" API calls. Hoping for updates

[+] eevilspock|11 years ago|reply
If up/down vote data were included in the API, much needed experimentation on collaborative filtering would be made possible! This is Hacker News after all.

Right now one team, Ycombinator, is trying to fix important issues in the ranking and moderation of posts and comments. Many of us are frustrated by the increasing domination of popularity (and hatred) over quality and relevance. A lot of good submissions and comments are simply buried, never to be found. There is too much muck to have to wade through. The timing of posts and comments plays a much larger role than quality. I could go on and on.

Imagine a Netflix Prize-like flowering of experiments and collaboration, leveraging the hacker community's collective smarts and enthusiasm. Many of us have ideas, but right now are unable to test them. What a shame if a great idea dies on a notepad.

There are two possible issues with opening up voting data: gaming and privacy. If having vote data allows someone to game the front page, then only include it with some delay (2 days?) so that it could't be used to game the front page. This will still allow experimentation with collaborative filtering algorithms and the like.

My take on the privacy issue is that anonymity isn’t that important for a site like Hacker News:

1. Startup culture is about straight talk, putting your money where your mouth is, and open critical feedback, both in the giving and receiving. There are precedents for exposing voting data (e.g. Quora, Facebook, Stack Exchange).

2. HN is not aimed at political discussions or other topics where anonymity can be paramount.

3. Pseudonymity is sufficient for those who don’t want their votes and comments tied back to their actual identity.

Thoughts?

I would love to hear from others who yearn to experiment with alternate algorithms and strategies for improving Hacker News.

[+] dang|11 years ago|reply
There are many legitimate views on this, but FWIW mine differs from yours. I believe that anonymity actually is important for a site like Hacker News, and the odds of us ever publishing the vote data—even pseudo-anonymized—are small. Sorry to disappoint.
[+] comeonnow|11 years ago|reply
I built a scraper around 3 years ago (been through a few usernames since then), and I've had to change it once 3 months ago because the HTML output added quotes around HTML attributes.

Even though it's read only, I'll continue to use my scraper rather than the API simple because it's one request, rather than the API would require one request for the top IDs and then one call per story, so it would be 31 calls instead of just 1.

Unless I'm missing something, it seems fairly poorly designed for top stories, and non existent for new stories.

------

EDIT: Looks like I missed the text about updating to a new rendering system in 3 weeks time, and to iterate designs faster to allow mobile friendly theming. Looks Like I WILL be updating to use the API

[+] tomw1808|11 years ago|reply
yeah, I just have the same problem here... and then I have basically the same question as someone mentioned below... new stories through the api? do we have to get the max-id and then get everything below the max-id and check if its a story? and other ideas?
[+] jxm262|11 years ago|reply
Yay! I've been wanting something like this to come out. I've been playing around with some new tech stacks and built a css replacer of hacker news, but always wanted an actual api to make it easier.

http://jmaat.me/hn

There's a bunch of css pages that come out for hacker news, but I couldn't find anything that aggregates them. This will be alot easier to extend and customize the site.

I'm not seeing any api's for the jobs or show sections though? Hopefully this might come in the future?

[+] ssorallen|11 years ago|reply
The Firebase JavaScript library makes make this impressively straightforward to use. I built a clone using React.js and Firebase's library. Because v0 of the API requires a request for each news story, it's not possible to use Firebase's React mixin yet.

https://github.com/ssorallen/hackernews-react

[+] andrewstuart2|11 years ago|reply
I'm definitely excited about the API and the future possibilities with it. Looks like a great start. I do have a few questions and suggestions, though.

Is there any chance of getting more than just the top 100 stories returned? I think it will be a lot more useful for api consumers if you can use a query parameter to set the limit (within reason, usually 1,000) and a number of results to skip. For now, scraping is still more desirable to me since I can retrieve any number of results in their current order.

Better yet, but more complex: a number to skip and a certain timestamp so I don't see the same article on two pages due to natural upvoting, downvoting, or rank decay.

Also, if there's any flexibility still with property names, I'd suggest these changes for clearer semantics: "deleted" -> "hidden" (since they're obviously not deleted) "by" -> "author" (for more clarity) "kids" -> "children" (the common convention)

[+] paulsutter|11 years ago|reply
Please do allow other sites to use HN logins. Then the community could develop useful sister services.

For example, a site where HN members can upvote and rate different development tools, libraries, IDEs, management tools, etc. All with backlinks to HN discussions. It's a great community and there are many ways we could share knowledge and experience.