top | item 14651665

Ask HN: What are the best resources to learn computer vision?

214 points| ameyades | 8 years ago | reply

Ideally, I would like to be good enough to get a job at an AI/robotics startup. I already have a CS degree, a decent math background, and am working as an embedded software developer for a large company.

57 comments

order
[+] rsp1984|8 years ago|reply
I think the question is a little too unspecific for there to be a good answer. The field is vast and depending on which thing in computer vision you want to tackle the best learning paths may vary greatly. Just to give a bit of an overview:

Before the Deep Learning Craze started in 2011 more classical Machine Learning techniques were used in CV: Support Vector Machines, Boosting, Decision Trees, etc..

These were (and still are!) used as a high level component in areas like recognition, retrieval, segmentation, object tracking.

But there's also a whole field of CV that doesn't require Machine Learning learning at all (although it can benefit from it in some cases). This is typically the area of geometrical CV, like SLAM, 3D reconstruction, Structure from Motion and (Multi-View) Stereo, anything generally where you can write a (differentiable) model of reality yourself using hand-coded formulas and heuristics and then use standard solvers to obtain the model parameters given the data.

Whenever it's too hard to do that (for example trying to recognize many different things in images) you need a data-driven / machine learning approach where the computer comes up with the model itself after seeing lots of training examples.

As for resources the other answers are already giving a great overview. Use Karpathy's course for an intro to Deep Learning for CV but don't expect it to be comprehensive in terms of giving you an overview of CV.

Learn OpenCV for more low level, non-ML and generally more "old-school" Computer Vision.

A personal recommendation of mine is http://www.computervisionblog.com/ by Tomasz Malisiewicz. It's an excellent resource if you want to get an overview of what's happening in the field.

[+] AndrewKemendo|8 years ago|reply
Great points.

I would argue Kinetic or Geometrical Computer Vision problems, things like Tracking, Mapping, Reconstruction, Depth Estimation are best suited for the classical approaches like VO, SFM/MVS, SIFT/SURF, HOG etc... and are a separate category of CV problems than object recognition/detection/segmentation - much more capable of being done with ML because dimensionality is reduced.

But there's also a whole field of CV that doesn't require Machine Learning learning at all (although it can benefit from it in some cases).

In fact, Machine Learning has made almost no progress on most of what you mention, specifically SLAM and Multi-View Stereo. It takes completely rethinking how those are done when they are approached from the Deep Learning perspective.

[+] nojvek|8 years ago|reply
I absolutely love Adrian's blog. http://www.pyimagesearch.com/

He has articles on solving actual problems with OpenCV, dlib and tensorflow. I subscribe to the blog and try to do some of the tutorials myself.

Udacity is another great resource. Their self driving and robotics nanodegrees are great.

I am on the same path as you trying to pivot my career from full stack engineer and add CV + ML skills to it.

When we have decent robot hardware, I want to be the one programming them, not the one getting replaced by them.

[+] stared|8 years ago|reply
For a full course, Nothing beats CS231n: Convolutional Neural Networks for Visual Recognition http://cs231n.stanford.edu/ by Andrej Karpathy et al.

Also, for a general and high level introduction to neural networks, I wrote a Learning Deep Learning in Keras http://p.migdal.pl/2017/04/30/teaching-deep-learning.html, focusing on visual tasks.

[+] deepGem|8 years ago|reply
I did this course, but couldn't finish all the assignments. Loved it. Please note that this is a convolutional neural networks course, not computer vision as such. From what I know computer vision encompasses a variety of non machine learning based algorithms, which are not covered in this course.
[+] wyc|8 years ago|reply
This is pretty old school, but I recommend Multiple View Geometry by Hartley and Zisserman (http://www.robots.ox.ac.uk/~vgg/hzbook/) to get through the fundamentals...it's really good to understand the geometric foundations for the past 4 decades. Along the same lines, you have Introductory Techniques for 3-D Computer Vision by Trucco and Verri (https://www.amazon.com/Introductory-Techniques-3-D-Computer-...), which also goes over the geometry and the fundamental problems that computer vision algorithms try to solve. It often does come down to just applying simple geometry; getting good enough data to run that model is challenging.

If you just throw everything into a neural network, then you won't really understand the breadth of the problems you're solving, and you'll be therefore ignorant of the limitations of your hammer. While NNs are incredibly useful, I think a deep understanding of the core problems is essential to know how to use NNs effectively in a particular domain.

After getting a grip on those concepts, Szeliski's Computer Vision: Algorithms and Applications (http://szeliski.org/Book/) had some really amazing coverage of CV in practice. Mastering OpenCV (https://www.amazon.com/Mastering-OpenCV-Daniel-Lelis-Baggio/...) was very useful when actually implementing some algorithms.

[+] rjdagost|8 years ago|reply
A lot of real world computer vision is implemented on embedded devices with limited computational resources (ARMs, DSPs, etc.) so understanding how a lot of commonly used algorithms can be efficiently implemented in embedded systems is important. It is possibly a way for you to jump the gap from "embedded software developer" to "computer vision engineer". Also keep in mind that in many companies a "computer vision engineer" is fundamentally a different beast from a "software developer". A CV engineer creates software but the emphasis tends to be more on systems and is not 100% about software. This will vary a lot by company but if you're working with prototype hardware you will need to get at least a working knowledge of optics.

Fun and trendy though it may be, I would not focus on deep learning / convolutional neural networks to start off. Deep learning is a small subset of computer vision. I would focus more on understanding the basics of image processing, camera projection geometry, how to calibrate cameras, stereo vision, and machine learning in general (not just deep learning). Working with OpenCV is a good place to start for all of these topics. Set yourself a project with tangible goals and get to work.

[+] lightbyte|8 years ago|reply
Surprised nobody has posted http://course.fast.ai/ yet. I've been following along with it so far for the first 4 lessons and it has been extremely helpful in understanding how deep learning works from the perspective of someone who did not have much of any related baseline knowledge except how to program. Jeremy is an excellent practical teacher.
[+] thinkMOAR|8 years ago|reply
Happy it is interesting for you.

I too got this url referred by somebody, and I got excited after their extended intro why, how etc their course different and better then any other.

Though after 5 videos i know nothing more then from any other ML/AI guide on the internet then i did before. 99% is only related to image classifying, and i'm simply seeing too many guides for that.

If anybody has some good links/videos on ML/AI on structured data, please comment and i'll be thankful and happy to click 'm :)

[+] alexcnwy|8 years ago|reply
Seconded - can't recommend the course highly enough

A lot of 'traditional' computer vision methods e.g. Hough detector are simply inferior to deep learning approaches.

Plus, it's a lot easier than you'd think to get up and running, especially when you leverage pre-trained models...

[+] sphix0r|8 years ago|reply
OpenCV and http://www.pyimagesearch.com/

disclaimer: not related to any of these

[+] zionsrogue|8 years ago|reply
Adrian here, author of the PyImageSearch blog. Thank you for mentioning it, I appreciate it. If anyone has any questions about computer vision, deep learning, or OpenCV, please let me know.

In regards to OPs original question, I'm actually working on solving your very problem right now. About 1.5 years ago I created the PyImageSearch Gurus course (https://www.pyimagesearch.com/pyimagesearch-gurus/) with the aim of bridging academia with actual real-world computer vision problems. The course has helped readers in their academic careers, such as securing grants (http://www.pyimagesearch.com/2016/03/14/pyimagesearch-gurus-...) as well as students become practitioners and land jobs in the CV startup space (http://www.pyimagesearch.com/2017/06/12/pyimagesearch-gurus-...)

Within the next month I'll be launching PyImageJobs which will connect PyImageSearch readers (especially the Gurus course graduates) with companies/startups that are looking to hire.

Finally, I'm also working on my upcoming "Deep Learning for Computer Vision with Python" book (https://www.pyimagesearch.com/deep-learning-computer-vision-...) which is now 100% outlined and I'm on to the writing phase.

Definitely take a look and if you have any questions, please let me know or use the contact form on my website if you want to talk in private.

[+] thebarknight|8 years ago|reply
2nd for pyimagesearch. The author (not me) is a prolific and dedicated blogger who really wants to share his knowledge. Dude has a 'bootcamp' as well
[+] mendeza|8 years ago|reply
Highly recommend PyImageSearch!
[+] maffydub|8 years ago|reply
I found Computer Vision: Algorithms and Applications really good. You can download it for free (for personal use) at http://szeliski.org/Book/.
[+] alok-g|8 years ago|reply
+1

This is the most comprehensive book I know of on Computer Vision. The diagrams in the book (including captions) themselves do a great job of explaining things.

[+] indescions_2017|8 years ago|reply
Grad-level CV courses, all recently offered:

Princeton CS598F Deep Learning for Graphics and Vision

https://www.cs.princeton.edu/courses/archive/spring17/cos598...

Stanford CS331B: Representation Learning in Computer Vision

http://web.stanford.edu/class/cs331b/

UVa CS 6501: Deep Learning for Computer Graphics

http://www.connellybarnes.com/work/class/2016/deep_learning_...

GaTech CS 7476 Advanced Computer Vision

http://www.cc.gatech.edu/~hays/7476/

Berkeley CS294 Understanding Deep Neural Networks

https://bcourses.berkeley.edu/courses/1453965

Washington CSE 590V: Computer vision seminar

https://courses.cs.washington.edu/courses/cse590v/16au/

UT Austin CS 395T - Deep learning seminar

http://www.philkr.net/CS395T/

Berkeley CS294-43: Visual Object and Activity Recognition

https://sites.google.com/site/ucbcs29443/

UT Austin CS381V: Visual Recognition

http://vision.cs.utexas.edu/381V-fall2016/

And best of luck to you!

[+] fest|8 years ago|reply
I started by getting a webcam or two and trying out various projects: marker tracking (made an optical IR pass filter and tracked an IR LED with two cameras), object segmentation (e.g measure geometry of certain-colored objects).

Measure the speed or count the number of cars passing by your street. Try to implement an OCR for utility meter. There are lot's of applications you can train yourself in, and I guarantee that you will learn a ton from each and every one of them.

[+] mattfrommars|8 years ago|reply
Does anyone know if tech like OpenCV is used at companies developing their own "computer vision" product, maybe at Tesla? Or do they build their own technology from scratch which isn't available to public domain? Or do they say fork OpenCV and build upon it and heavily modify as OpenCV could be seen as 'outdated' technology.

Disclaimer: Never worked with any technology related to Computer Vision, just a bloodboy beginner Python programmer.

[+] chrinic726|8 years ago|reply
Used to work on Tesla Vision / Autopilot Vision. They used Caffe, were switching to Tensorflow when I left, but might be moving to Caffe2 now.

Usually no OpenCV on successful products. Facebook Ads has dedicated research engineers implementing their real time photo analysis algorithms.

[+] trwoway|8 years ago|reply
cs231n by Andrej Karpathy : http://cs231n.github.io
[+] lunpe|8 years ago|reply
This is a great resource. I give it to people who need to learn about convolutional neural networks.

However let's keep in mind that the field of computer vision is much vaster than that. Deep learning approaches have been very successful at solving problems in computer vision, but not all of them and not without drawbacks. I believe any course on classic computer vision will give him more insight as to what challenges computer vision aims to solve, how, and what approach might solve what problem.

[+] visarga|8 years ago|reply
You don't specialize in surgery before learning biology. Similarly, you don't specialize in CV before learning basic ML and DL. The fundamental concepts are the same no matter if the modality is text, image or video (for example: regularization, loss, cross validation, bias, variance, activation functions, KL divergence, embeddings, sparsity - all are non-trivial concepts that can't be grasped in a few minutes, and are not specific to CV alone).
[+] zelon88|8 years ago|reply
PyImageSearch by Adrian Rosebrock. http://www.pyimagesearch.com/
[+] zionsrogue|8 years ago|reply
Adrian here, author of PyImageSearch. Thanks for mentioning the blog. If anyone has any questions regarding learning computer vision, please see my reply to "sphix0r" below.
[+] gmiller123456|8 years ago|reply
It's a really broad field, so don't expect to get up to speed very quickly. A lot of people have recommended a lot of books already, and I could add to that list. One thing you might think about is Safari Books Online. You'll notice a lot of the recommended books are there, and even though it's a bit pricey, I think you'll find you'd save money by the time you get enough of the books that seem useful to you. You'll also loose nothing by jumping from book to book because they're too advanced/not advanced enough until you find one that's at your level.

I would recommend starting with one of the many OpenCV tutorial books, and maybe work your way through a few of those. Then move into books that cover more of the algorithms behind the library like "Multiple View Geometry" by Hartley and "Machine Vision" by Davies, among many others.

[+] lauritz|8 years ago|reply
I learned OpenCV using the O'Reilly book by Bradski and Kaehler (back when it was OpenCV 2). I found it well-structured and it worked for me. They have an updated version for OpenCV 3.

However, I can't tell you if OpenCV is still the framework of choice and/or widely used in the field you want to go into.