top | item 10924741

Top Books on Amazon Based on Links in Hacker News Comments

1043 points| gkst | 10 years ago |ramiro.org | reply

181 comments

order
[+] _lpa_|10 years ago|reply
I did something pretty similar over christmas, though I used named entity recognition to extract book titles rather than looking for amazon links, and (so far) also limited it to specific "Ask HN" threads about books. You can find it here: http://www.hnreads.com/. It is interesting to see how little overlap there is between the two, though that may be due to my using far fewer (and also newer) threads!
[+] bitcointicker|10 years ago|reply
Surprised to see Permutation City in that list. Given that the book is written in 1994, Gregg displays admirable prescience about how computing would develop. Honestly you would think it was written in the last 5 years or so. His vision of cloud computing is absolutely outstanding. It blew me away when I checked when the book was written after reading the first few chapters.

I'd read Schild's Ladder prior to reading Permutation city, which is also a good read. It does seem to get bogged down in the technical and descriptive side of things at times, however, it's a fantastic idea for a story. The main premise of the film would make a great movie.

Whilst I'm on the subject of good "Hard sci-fi" novels, Tau Zero is also worth reading.

Edit - I'll also throw this in: http://www.amazon.co.uk/gp/product/0814703259

Magic :-)

[+] temo4ka|10 years ago|reply
Many thanks for sharing this. One usability nitpick though: please consider fixing links so that they could be opened in new tabs/windows.
[+] erispoe|10 years ago|reply
Interesting that your list yields mainly fiction, while OP's list is mainly non-fiction.
[+] fratlas|10 years ago|reply
One thing that struct me about your site (apart from being a good list, well done!) is how blazingly fast (close to HN, which I find funny) the page loads. Could you fill us mere mortals in on how the fuck you got it so fast?
[+] rahimnathwani|10 years ago|reply
This is really cool. I started reading HPMoR after finding it.

I'm curious: after performing NET on the corpus, how did you filter to find books only? Did you just search on Amazon's catalogue for exact matches, or was more tweaking required?

[+] jccalhoun|10 years ago|reply
As a non-programmer I find your list more interesting.
[+] wtracy|10 years ago|reply
On the system browser on my phone (a Kyocera Rise running Android 4.0.4) all I see on that page is the header and footer, no content. :-( I get the same result if I hit it with Firefox with Javascript disabled, but Javascript is very much enabled on my phone browser.
[+] gkst|10 years ago|reply
Cool project! Do you mind telling which library or service you used for the named entity recognition?
[+] Pietertje|10 years ago|reply
Thanks for sharing! This list is a list I actually expected the OP's list to be. Probably because I'm also more likely to view the Ask HN threads about books.
[+] ALee|10 years ago|reply
It would be great if you or OP could do the analysis again on HN Reads, but with the larger dataset.
[+] edpichler|10 years ago|reply
Good result. Like a curated list, but automated. I saved here for my reading list.
[+] SloopJon|10 years ago|reply
Here's a discussion of the original upload of Hacker News data to Google BigQuery:

https://news.ycombinator.com/item?id=10440502

At 4 GB, I'd just as soon query this locally, but this looks like a fun exercise.

I notice that there were 10,729 distinct ASINs out of 15,583 Amazon links in 8,399,417 comments. Since I don't generally (ever?) post Amazon links, I'd be interested in expanding on this in two ways.

First, I'd reduce/eliminate the weight of repeated links to the same book by the same commenter.

Second, I'd search for references to the linked books that aren't Amazon links. Someone links to Code Complete? Add it to the list. In a second pass, increment its count every time you see "Code Complete," whether it's in a link or not.

[+] gkst|10 years ago|reply
Discounting multiple links by the same user is a good idea. Your seconds suggestion brings some rather complex problems, for example if a comment goes like "Code Complete is the worst book I ever read" it is certainly not an endorsement, while linking to a book in most cases is. Also a sentence like "programming perl is fun" does not necessarily refer to the book.

So this would require some form of sentiment analysis and also require book titles to be uniquely identifiable.

[+] minimaxir|10 years ago|reply
> At 4 GB, I'd just as soon query this locally, but this looks like a fun exercise.

This requires scraping all the Hacker News data manually, for which I have a tool to do so (https://github.com/minimaxir/get-all-hacker-news-submissions...) which I mentioned in the post you linked, but it still requires a significant amount of time to get/process the data, hence why the BigQuery dataset has a significant advantage.

[+] niuzeta|10 years ago|reply
The absence of SICP, I imagine, is because when people refer to the SICP, they usually just link to the open link to the book: https://mitpress.mit.edu/sicp/ .
[+] gkst|10 years ago|reply
Yes, that is probably the case. Quoting from the post

"Amazon is often the goto website for referring books, but many books have dedicated homepages as well as pages pages on their publisher's website. Moreover, many freely available are referred frequently in comments, but are not considered in this ranking."

The approach used here has limitations, I hoped to make that clear by pointing them out and choosing titles and headlines accordingly.

[+] meadori|10 years ago|reply
Having owned and read through "Introduction to Algorithms" for years I agree that it is a good book. However, recently I have been feeling like it is recommended way too often without thought.

It is not the best when it comes to explaining things in an intuitive manner. It is a great reference book with lots of algorithms and proofs.

In recent years I have been drawn more towards Levitin's "Introduction to the Design and Analysis of Algorithms".

Anyone else have similar feelings about "Introduction to Algorithms"?

[+] a_bonobo|10 years ago|reply
How come "Darwin's Theorem" appears so often? It's quite unknown, with one review on Goodreads and 4 reviews on Amazon

Is this a result of the author spamming his own work?

Edit: Looks like it, short skimming of "darwin's theorem site:news.ycombinator.com" shows that all links are from user tjradcliffe, who is the author. A case for manual curation of data.

[+] tagawa|10 years ago|reply
Or a case for counting a single author's multiple links of the same book as one vote.
[+] mattip|10 years ago|reply
Out of 8 million data points the top book got around 50 references. I wonder how much significance should be attached to that, it looks to me to be down in the noise level.
[+] jacko0|10 years ago|reply
Code: The Hidden Language of Computer Hardware and Software" by Charles Petzold. The best book I've ever read.
[+] DanielBMarkham|10 years ago|reply
Related: There are a ton of sites set up like this. Hopefully somebody will post a list. Lotta work by HN folks on various ways of slicing and dicing the data.

I wrote this curated site from HN several years ago. Got tired of people continuously asking for book recommendations. http://www.hn-books.com/

Couple points of note. This is 1) an example of a static site, 2) terrible UI, 3) contains live searches to comments on each book from all the major hacking sites, and 4) able to record a list of books that you can then share as a link, like so (which was my reason for making the site)

"My favorite programming books? Here they are: http://www.hn-books.com#B0=138&B1=15&B2=118&B3=20&B4=16&B5=1... "

I started writing reviews each month on the books, but because they were all awesome books, I got tired of so many superlatives!

Thanks for the site.

[+] tern|10 years ago|reply
I maintain a list of HN hacks here: https://www.are.na/morgan-sutherland/hacker-news. I've seen a couple other book projects over the years including: http://hn-books.com/ and http://hackershelf.com/browse/.
[+] nextos|10 years ago|reply
Is it possible that some books have been missed due to acronyms employed in comments?

E.g:

- SICP: Structure and Interpretation of Computer Programs

- CTM: Concepts, Techniques, and Models of Computer Programming

- TAOP: The Art of Prolog

[+] anc84|10 years ago|reply
Please share how much the affiliate tag generates.
[+] ryangittins|10 years ago|reply
"How DARE you make money from this thorough, thoughtful, and well-researched post at no cost to me?!"

I just never understand people's hatred for affiliate links in good pieces of content.

[+] artursapek|10 years ago|reply
I remember Jeff Atwood's 4k monitor review post [1]. Someone had calculated that he made thousands of dollars pimping that thing.

I have no issue with people doing this, as long as their posts are not solely motivated by wanting an excuse to post their affiliate link. I guess the more popular you get, the more likely that is to happen.

[1] http://blog.codinghorror.com/our-brave-new-world-of-4k-displ...

[+] gkst|10 years ago|reply
Do you accept any form of advertising on the web?
[+] myth_buster|10 years ago|reply
I believe people would just write the name of the really popular books like TAOCP, Hackers, Founders at work etc rather than linking to them.

The list:

  "The Rent Is Too Damn High: What To Do About It, And Why It Matters More Than You Think" by Matthew Yglesias
  Publisher: Simon & Schuster
  
  "The Four Steps to the Epiphany: Successful Strategies for Products that Win" by Steven Gary Blank
  Publisher: Cafepress.com
  
  "Introduction to Algorithms, 3rd Edition" by Thomas H. Cormen
  Publisher: The MIT Press
  
  "Influence: The Psychology of Persuasion, Revised Edition" by Robert B. Cialdini
  Publisher: Harper Business
  
  "Peopleware: Productive Projects and Teams   (Second Edition)" by Visit Amazon's Tom DeMarco Page
  Publisher: Dorset House Publishing Company, Incorporated
  
  "Code: The Hidden Language of Computer Hardware and Software" by Charles Petzold
  Publisher: Microsoft Press
  
  "Working Effectively with Legacy Code" by Michael Feathers
  Publisher: Prentice Hall
  
  "Three Felonies A Day: How the Feds Target the Innocent" by Harvey Silverglate
  Publisher: Encounter Books
  
  "JavaScript: The Good Parts" by Douglas Crockford
  Publisher: O'Reilly Media
  
  "The Little Schemer - 4th Edition" by Daniel P. Friedman
  Publisher: The MIT Press
  
  "The E-Myth Revisited: Why Most Small Businesses Don't Work and What to Do About It" by Michael E. Gerber
  Publisher: HarperCollins
  
  "Feeling Good: The New Mood Therapy" by David D. Burns
  Publisher: Harper
  
  "Programming Collective Intelligence: Building Smart Web 2.0 Applications" by Toby Segaran
  Publisher: O'Reilly Media
  
  "The Non-Designer's Design Book (3rd Edition)" by Robin Williams
  Publisher: Peachpit Press
  
  "The C Programming Language" by Brian W. Kernighan
  Publisher: Prentice Hall
  
  "The Design of Everyday Things" by Donald A. Norman
  Publisher: Basic Books
  
  "Cracking the Coding Interview: 150 Programming Questions and Solutions" by Gayle Laakmann McDowell
  Publisher: CareerCup
  
  "What Intelligence Tests Miss: The Psychology of Rational Thought" by Keith E. Stanovich
  Publisher: Yale University Press
  
  "On Writing Well, 30th Anniversary Edition: The Classic Guide to Writing Nonfiction" by William Zinsser
  Publisher: Harper Perennial
  
  "Darwin's Theorem" by TJ Radcliffe
  Publisher: Siduri Press
  
  "Knowing and Teaching Elementary Mathematics: Teachers' Understanding of Fundamental Mathematics in China and the United States (Studies in Mathematical Thinking and Learning Series)" by Liping Ma
  Publisher: Routledge
  
  "Don't Make Me Think: A Common Sense Approach to Web Usability, 2nd Edition" by Steve Krug
  Publisher: New Riders
  
  "Expert C Programming: Deep C Secrets" by Peter van der Linden
  Publisher: Prentice Hall
  
  "Clean Code: A Handbook of Agile Software Craftsmanship" by Robert C. Martin
  Publisher: Prentice Hall
  
  "The Elements of Computing Systems: Building a Modern Computer from First Principles" by Noam Nisan
  Publisher: The MIT Press
  
  "Code Complete: A Practical Handbook of Software Construction, Second Edition" by Steve McConnell
  Publisher: Microsoft Press
  
  "The Box: How the Shipping Container Made the World Smaller and the World Economy Bigger" by Marc Levinson
  Publisher: Princeton University Press
  
  "Software Estimation: Demystifying the Black Art (Developer Best Practices)" by Steve McConnell
  Publisher: Microsoft Press
  
  "Refactoring: Improving the Design of Existing Code" by Martin Fowler
  Publisher: Addison-Wesley Professional
  
  "Design for Hackers: Reverse Engineering Beauty" by David Kadavy
  Publisher: Wiley
[+] jraines|10 years ago|reply
Yeah -- I'd be willing to bet that "How To Win Friends and Influence People" is the most mentioned book here; maybe people just don't link to it.
[+] jlarocco|10 years ago|reply
Thanks for posting the list. The chart in the article makes it impossible to tell what the books are without hovering over each one to see the captions.
[+] busterarm|10 years ago|reply
Or using their ISBN...

SICP gets mentioned a lot too.

[+] nefitty|10 years ago|reply
Hard to read on mobile. Couldn't get past the first few. It is annoying to have to click a tiny thumbnail to read a bad, extracted synopsis from Amazon.
[+] corysama|10 years ago|reply
Interesting to see Influence so high, but Predictably Irrational not listed at all. I've heard Influence is a really great book, but from a quick skim it seems like Predictably Irrational covers the subject matter as least as well if not better. I'd be happy to hear the opinion of someone who has actually read both.
[+] wbeckler|10 years ago|reply
I've read both and Influence is far more useful if you're trying to, well, influence someone. The art of influencing is complex and involves more than just a few behavioral economics insights. Influence is a total framework for understanding the psychology and emotions of selling.
[+] agentgt|10 years ago|reply
I was surprised not to see Dale Carnegie's book either but I suppose its rather dated and not as scientific (How to win friends and...). Carnegie's book had some of the greatest impacts on my personal life and professional.
[+] misiti3780|10 years ago|reply
Influence was a great book, but it is a bit outdated (in my opinion). Predictably Irrational and his other books were much more relevant. Thinking Fast and Slow was the best one of then all.
[+] noobie|10 years ago|reply
Sad I couldn't find none of non-technical books on Audible. Any audiobook "readers" out there?