top | item 12817426

Ask HN: What is your goto resource for learning about big data, ML, AI etc?

144 points| vijayr | 9 years ago | reply

40 comments

order
[+] curiousgal|9 years ago|reply
HN has a great and I mean absolutely great search feature via Algolia https://hn.algolia.com and this particular question keeps springing up every now and then, no one seems to use the feature despite the search bar being at the bottom of every page.

Edit: removed "inb4 downvotes".

[+] atom_enger|9 years ago|reply
I never knew it was at the bottom and I use this site all the time. Thanks for pointing that out. However, it does raise some questions about the UI in this case. Can't we put the search box up in the header where people expect it to be?
[+] matthudson|9 years ago|reply
HN search has been one of the most helpful resources (among many) to my personal and professional life. +1 for that alone. It sowed the seeds for a career path (went from a yoeman replaceable scripter to a guy with a reliable paycheck that can comment with angst on HN.) HN-algolia is snappy. I've been developing a behavior where I default to searching HN before I search google (for better or worse).

Also, I really apologize for this but, please don't say things like: "Time for me to get down voted to oblivion".

You've spoken your mind (and helpfully so) with the end of your comment (which is otherwise good).

Self-referencing how one expects comment voting to go is a behavior that I wish people would refrain from. It makes the comment "about" itself --- rather than the content. It's a primer that stems from perceptions about how it will be interpreted by the community, which in turn manipulates voting behavior about the comment. (<insert-discussion> voting systems on community forums. is voting itself a good system? </insert-discussion>).

[+] 0x54MUR41|9 years ago|reply
That's true.

If you need books as your learning resources, I would recommend to search it via Hacker News Book [1]. That site scrapes books based on the shared links on HN comment and ranks them.

[1]: http://hackernewsbooks.com/

[+] BrandonWatson|9 years ago|reply
Reading the responses here, I wonder how a revamped HN homepage would look like if there was a search bar at the top of the page.

The user story for search has been solved. What hasn't been solved, it sounds like, is feature discoverability.

[+] donretag|9 years ago|reply
I actually do not care for the Algolia search functionality. The previous search worked far better.

Algolia has suggestion features built-in which cannot be disabled (synonyms? autocorrect?) which will return content that perhaps does not much what the user really wants if they want an exact search. This behavior is especially important to developers since our terminology does not match the English (the language of HN) vocabulary many times. Try searching for the product "logsene", which is simply an example. Quoting words, such as what Google uses, does not work all the time.

[+] loader|9 years ago|reply
Whelp, I just learned HN has a search box!
[+] dhawalhs|9 years ago|reply
For complete newbies (but with programming experience), I would recommend this UW Coursera course to get introduced to ML Basics: https://www.coursera.org/learn/ml-foundations

Early this year Apple acquired Turi for $200 million. It was founded by Carlos Guestrin, one of the professors who is teaching the course.

We (Class Central) are also working on a six part Wirecutter style guide to learning Data Science online. Here is part 1: https://www.class-central.com/report/best-programming-course...

Feedback would be appreciated (on the format as well as content)!

[+] tgokh|9 years ago|reply
I'm a huge fan of the rest of this Coursera specialization (or was, until they started charging to submit assignments for it mid-specialization, but I digress...)

Carlos and Emily do a great job diving deeper than most other online courses into the math behind different algorithms without making the math too theoretical. I'm a grad student in engineering, so I wanted to understand not only how to run these algorithms but also how they work and these courses were great for learning in a mathematically rigorous but still approachable sort of way.

The only criticism I've heard of this series is that it uses Turi/Dato/Graphlab instead of SciKit-Learn. I did the courses that exist so far using GraphLab, but I'm starting to redo the assignments using SciKit now so that I learn that toolkit as well.

[+] geebee|9 years ago|reply
It depends on your focus, of course. Andrew Ng's coursera is famous, and it's ideal for someone who wants to get into the mathematics behind various ML algorithms. However, this class is will take you into implementing algorithms, but is less about applying them.

If you want to just try them out, I'd honestly recommend just going through the scikit-learn documentation. Almost all of the algorithms provide an example, and the API is pretty consistent across different ML algorithms, to the extent that it can be.

People learn differently, some people prefer to get into the math right away, others will never be interested in it. I'm interested, but I tend to be more motivated when I've used the algorithms, start to learn about how and why they perform well or poorly under various circumstances, and then dig into the mathematics specifically to find out why.

Also, I'm not going to be creating new ML algorithms. So, you know, that also influences my level of interest. I do care about the mathematics involved, because I do want to genuinely understand why some outputs are available for random forests but not naive bases or logistic regression, why performance and/or accuracy is great in some circumstances and not others, and I don't want to have to rely on too much hand waving. But if you want to actually develop and research novel ML algorithms, you'd need to get considerably deeper into the math.

[+] sremani|9 years ago|reply
Udacity has a free Introduction to Machine Learning (which use scikit-learn, python). They also have nano-degrees which are paid.
[+] BrandonBradley|9 years ago|reply
For big data, 'Big Data' by Nathan Marz was an excellent read. The conceptual chapters are top notch, and the implementation chapters give you a good look into the tools used for the field at the time of publishing.
[+] nborwankar|9 years ago|reply
Shameless plug: LearnDataScience http://learned.com is a git repo with Jupyter Notebooks, data and instructions. It's meant for programmers, assumes no math background and addresses data cleaning issues which most classes ignore. Having said that Andrew Ng's class on Coursera is gold.
[+] raju_bala|9 years ago|reply
Conferences like WWW, KDD, ICML for latest, coursera for basics, and textbooks like Pattern matching by Bishop.
[+] enthdegree|9 years ago|reply
A classic reference is Pattern Recognition and Machine Learning by Bishop