Introduction to Applied Linear Algebra: Vectors, Matrices, and Least Squares

[+] muhneesh|7 years ago|reply

I'm e-learning Linear Algebra right now to have a good math foundation for Machine Learning.

I was a History and Sociology major in college - so I didn't take any math.

If you are like me, and working off an initial base of high school math, I would recommend the following (all free):

Linear Algebra Foundations to Frontiers (UT Austin) Course: https://www.edx.org/course/linear-algebra-foundations-to-fro... Comments: This was a great starting place for me. Good interactive HW exercises, very clear instruction and time-efficient.

Linear Algebra (MIT OpenCourseware) Course: https://ocw.mit.edu/courses/mathematics/18-06-linear-algebra... Comments: This course is apparently the holy grail course for Intro Linear Algebra. One of my colleagues, who did an MS in EE at MIT, said Gilbert Strang was the best teacher he had. I started off with this but had to rewind to the UT class because I didn't have some of the fundamentals (e.g. how to calc a dot product). I'm personally 15% through this, but enjoying it.

Linear Algebra Review PDF (Stanford CS229) Link: http://cs229.stanford.edu/section/cs229-linalg.pdf Comments: This is the set of Linear Algebra review materials they go over at the beginning of Stanford's machine learning class (CS229). This is my workback to know I'm tracking to the right set of knowledge, and thus far, the courses have done a great job of doing so.

[+] flor1s|7 years ago|reply

Don't forget to review calculus as well. Khan Academy is a good start for learning about single variable calculus (http://www.khanacademy.org), but their content on multivariable calculus is a bit lacking (neural networks / deep learning use the concept of the derivatives and the gradient a lot). A good supplement for multivariable calculus would be Terence Parr and Jeremy Howard's article on "All the matrix calculus you need for deep learning": https://explained.ai/matrix-calculus/index.html

[+] earthicus|7 years ago|reply

> This course [Strang] is apparently the holy grail course for Intro Linear Algebra.

I haven't watched his lectures, but I TA'd a linear algebra course that used his text book, and strongly disliked his presentation. I've heard that's a fairly common reaction actually - it's one of those love it or hate it books. I'm bringing it up because if you (or someone else reading this) turn out to be in the group that doesn't love it, you should not give up on loving linear algebra! You are definitely still allowed to have a different 'holy grail course'!

[+] ivansavz|7 years ago|reply

That's a pretty good list, here are some things I'd add.

Amazing js visualizations/manipulatives for many LA concepts: http://immersivemath.com/ila/index.html

LA Concept map: https://minireference.com/static/tutorials/conceptmap.pdf#pa... (so you'll know what there is to learn)

Condensed 4-page tutorial: https://minireference.com/static/tutorials/linear_algebra_in... (in case you're short on time)

And here is an excerpt from my book: https://minireference.com/static/excerpts/noBSguide2LA_previ... (won't post a link to it here, but check on amazon if interested)

[+] espeed|7 years ago|reply

Good recommendations. In addition to the UT, MIT and Stanford courses you recommend above, for developing your visual intuition, 3Blue1Brown's Essence of Linear Algebra video series is second to none. [0]

Another good one is MathTheBeautiful [1] by MIT alum Pavel Grinfeld [2]. He approaches Linear Algebra from a geometric perspective as well, but with more emphasis on the mechanics of solving equations. He has a ton of videos organized into several courses, ranging from in-depth Intro to Linear Algebra courses to more advanced courses on PDEs and Tensor Calculus.

Esp note his video on Legendre polynomials [3] and Why {1,x,x²} Is a Terrible Basis: https://www.youtube.com/watch?v=pYoGYQOXqTk&index=14&list=PL....

Gilbert Strang was Greenfield's PhD advisor: https://dspace.mit.edu/handle/1721.1/29345. Pavel has a clear and precise teaching style like Strang, and he makes reference to Prof's Strang and his MIT course from time to time.

NB: Prof Strang has a new book Linear Algebra and Learning from Data that just went to press and will be available in print by mid Jan 2019. A few chapters are available online now, and the video lectures from the new MIT course should on YouTube in a few weeks. [4]

[0] Essence of Linear Video Series (3Blue1Brown): https://www.youtube.com/channel/UCYO_jab_esuFRV4b17AJtAw/pla...

[1] MathTheBeautiful https://www.youtube.com/watch?v=pYoGYQOXqTk&index=14&list=PL

[2] https://en.wikipedia.org/wiki/Pavel_Grinfeld

[3] https://en.wikipedia.org/wiki/Legendre_polynomials

[4] MIT Linear Algebra and Learning from Data (2018) http://math.mit.edu/~gs/learningfromdata/

[+] peteretep|7 years ago|reply

A bit weird to add a negative review, but here goes:

https://www.coursera.org/learn/linear-algebra-machine-learni...

Is _not_ a good introduction. The instructors are all over the damn place, and you will spend much of your time finding better explanations from other sources. Wish I hadn't started with this. On the plus side, you will get a certificate at the end.

[+] StefanKarpinski|7 years ago|reply

This is a beautiful book and a great intro to the basics of linear algebra. All the figures in the book are generated in Julia and there’s a companion book with Julia code for computational examples:

http://vmls-book.stanford.edu/vmls-julia-companion.pdf

[+] nicebill8|7 years ago|reply

The 3B1B series on Linear Algebra is by far the most welcoming and informative introduction to the topic I've ever seen: https://www.youtube.com/watch?v=fNk_zzaMoSs&list=PLZHQObOWTQ...

[+] aidos|7 years ago|reply

Any linear algebra post I come here to comment the same (if someone else hasn’t done it already). Seriously, this series is absolutely wonderful.

[+] flor1s|7 years ago|reply

After reading this book (or during), take a look at the author's (Boyd) video lectures on linear dynamical systems: https://see.stanford.edu/Course/EE263/

There is a lot of overlap between the book and the course.

[+] chrispeel|7 years ago|reply

Also take a look at Boyd and Vandenberghe's book on convex optimization: https://web.stanford.edu/~boyd/cvxbook/

[+] squidgyhead|7 years ago|reply

You might be interested in the Open Textbook Initiative by the American Institute of Mathematics:

https://aimath.org/textbooks/

[+] k__|7 years ago|reply

Somehow I found linear algebra easier than calculus, but I don't know why.

I did both at the same time in university, but failed calculus 3 times and aced linear algebra at the first try.

I'd expect being either good or bad at math, not both at the same time

[+] impendia|7 years ago|reply

Math professor here ---

Quality of teaching might have something to do with it.

But, also, calculus is much harder to understand at a rigorous, formal level than at an informal level.

On one level you can try to understand what the main concepts are about, be able to compute derivatives and integrals, solve optimization and related rates problems, and so on. I'd recommend Silvanus Thompson's Calculus Made Easy over any mainstream calculus book for this. In my opinion, the book succeeds amazingly at fulfilling the promise of its title.

But suppose you really try to read any mainstream calculus book, and understand everything. For example:

- Why are limits defined the way they are (with epsilons and deltas)?

- The book will probably touch lightly upon the Mean Value Theorem -- why is this important? What's the point?

- Why is the chain rule true? It reads dy/dx = (dy/du) (du/dx). Yay! This is just cancelling fractions, right? Any "respectable" calculus book will insist that it's not, but most students will cheerfully ignore this, still get correct answers to the homework problems, and sleep fine at night.

- Consider the function e^x. How is it defined? The informal way is to say e = 2.71828... and we define exponents "as usual". Most students are perfectly happy with this. But does this really make sense if x is irrational? Your calculus book might bend over backwards to define everything properly (e^x is the inverse to ln(x), which is defined as a definite integral), and it takes a lot of work to appreciate why.

In my experience, these sorts of issues mostly don't pop up in linear algebra, where the proofs tend to parallel the handwavy heuristics. I wonder if this had anything to do with your experience?

[+] jcranmer|7 years ago|reply

> Somehow I found linear algebra easier than calculus, but I don't know why.

I suspect the answer is that your calculus course was a lot heavier on crank-grinding: having to readily apply integration and differentiation on a wide panoply of functions, some of them you're not really familiar with (such as arccos). If you're weak on trigonometry or some algebraic manipulations, that's going to shut out the ability to do a lot of the crank-grinding without really impacting your ability to understand the concepts.

By contrast, the crank-grinding in linear algebra is a lot less involved. The most complex algebra is going to be solving polynomial equations to find the eigenvalues of a matrix, but those are generally going to mostly be quadratic equations since asking anyone to solve more complex equations by hand is going to ask for trouble. Otherwise, it's largely plug-and-chug numbers into stock formula. Gram-Schmidt orthonormalization? Pick a vector, normalize it, project the other vectors and cancel them out, and repeat until you've done all of them.

[+] edflsafoiewq|7 years ago|reply

Linear algebra should be easier than calculus shouldn't it? The whole program of differential calculus is basically that we already know how to solve problems in linear algebra, so let's solve other problems by reducing them to questions of linear algebra in the tangent space.

[+] mipmap04|7 years ago|reply

Linear algebra is probably my favorite part of math from a practicality standpoint. I'm not in a math heavy field, but knowing how to use matrices to solve optimization problems has been very helpful.

[+] gxs|7 years ago|reply

Sort of off topic - but I was a math major in college and I always had a broad, casual categorization of the types of classes I took.

Linear Algebra - felt like it required you to be able to hold really long trains of thought in your head

Probability - felt like you had to be clever

Analysis - felt like you just had to think critically and approach things from all angles

I always preferred Algebra - felt like I was writing essays not doing math

[+] VikingCoder|7 years ago|reply

I'm going to complain about this every chance I get.

A 2D vector, we generally store as [x, y, 0]. What's the extra 0? The homogeneous coordinate.

A 2D point, we generally store as [x, y, 1]. That extra 1 is the homogeneous coordinate, and since it's there, it means "and apply translations!"

If I have a 2D transform, I put the translation component in the last row or column, depending on if you pre-multiply or post-multiply (I can never remember which).

When I transform a vector by that matrix, the 0 in the homogeneous coordinate means translation doesn't apply.

Perfect!

But what if I have a 3D vector? Well... I end up with [x, y, z, 0], right?

Ugh.

If instead, we stored the homogeneous coordinate in the FIRST position, [0, x, y] for 2D, and [0, x, y, z] for 3D, etc. then it's just a sparse vector! Set the values you want to! [0] is the 0-vector in any number of dimensions!

[1] is the origin point in any number of dimensions!

Why did we put the homogeneous coordinate last in all our internal representations? It was so dumb!

[+] twtw|7 years ago|reply

I don't follow. What do we gain by moving the homogeneous coordinate from last position to first?

I don't understand this:

> then it's just a sparse vector! Set the variables you want to!

Or this:

> [1] is the origin point in any number of dimensions.

Could you clarify?

Also, I don't think this book even discusses homogeneous coordinates. It would be sort of unusual for this type of general text and the only mention of "homogeneous" in the index is "homogeneous equation."

[+] starmole|7 years ago|reply

I hope you are just trolling. Homogeneous coordinates are for projection, not for selecting translation.

[+] chengiz|7 years ago|reply

With due respect, who is "we" and what are you talking about? The book does not as much as mention a homogeneous coordinate and uses 2-arrays for 2D vectors.

[+] unknown|7 years ago|reply

[deleted]

[+] thrwmlaccnt|7 years ago|reply

Is theoretical linear algebra at the level of axler helpful for machine learning? If so, in what ways?

[+] TobiasA|7 years ago|reply

I'm a self taught programmer with a very weak maths background. What's the best learning path for me if I want to be able to understand and create ML based applications?

[+] hackermailman|7 years ago|reply

There's a practical course for this http://www.datasciencecourse.org/lectures/ anything you don't know, like linear algebra, look up the topics here for a 1-2hr crash course https://www.youtube.com/playlist?list=PLm3J0oaFux3aafQm568bl...

There's a playlist for a math background in ML for anybody who wants to try a more rigorous ML course https://www.youtube.com/playlist?list=PL7y-1rk2cCsA339crwXMW... More information, including recommended texts https://canvas.cmu.edu/courses/603/assignments/syllabus but don't let that list of prereqs discourage you, can easily look them up directly. You don't have to understand all of Linear Algebra to do matrix multiplication. There's plenty of ML books, papers and playlists on youtube for a full course in ML from dozens of universities https://www.cs.cmu.edu/~roni/10601/ (click on 2017 lectures)

Note never trust YouTube or any other resource to be around forever, make sure you archive everything before you start taking it as lectures tend to disappear (then seed them for others ^^ )

If you have a really weak background go through this free book, refuse to not be able to complete it https://infinitedescent.xyz/

There's no answers because the author gives thanks to a grad course in evidenced based teaching where he claims the only way to really know something and remember it is to figure it out for yourself. Math stackexchange can help too.

[+] FranzFerdiNaN|7 years ago|reply

For calculus I-III, look at Professor Leonard on youtube. He is the best. His channel can be found at https://www.youtube.com/channel/UCoHhuummRZaIVX7bD4t2czg

[+] nyc111|7 years ago|reply

> 2-vector (x1,x2) can represent a location or a displacement in 2-D...

Isn’t this fundamentally faulty? Same notation describing a point and displacement. From this, we may conclude that, a point and a displacement are the same thing because they are described by the same notation. Shouldn’t mathematics be free of such contextual interpretation?

[+] earthicus|7 years ago|reply

There's no issue with the notation; I think you've misunderstood the mathematical idea. Consider a more familiar algebraic object, a real number, x. This can model a length, an area, volume, time, time interval, temperature, weight, speed, physical constant, geometric ratio, fractional dimension, etc...

In mathematics, we abstract by forgetting about what the things are, and retain information about how they behave, and about what abstract properties they satisfy. The insight is that 2d locations and 2d displacements have the same abstract properties, which are modeled by a certain algebraic object: 2-vectors.

[+] Koshkin|7 years ago|reply

My advice would be not to get stuck on these “philosophical” questions, if your goal is to actually learn math, and instead just press on and keep learning and solving real problems. Eventually the fog will dissolve by itself, and these kinds of questions will seem to you either naive or devoid any real substance, or just uninteresting compared to everything else that you have learned.

[+] mmmmpancakes|7 years ago|reply

A book on applied linear algebra with a focus on regression and no mention anywhere of the singular value decomposition??

[+] IgniteTheSun|7 years ago|reply

This book has a lot of very interesting applications and seems to cover information not normally found in first books on Linear Algebra (e.g., makes use of calculus, Taylor series, etc) and the authors are EEs, not mathematicians. It doesn't, however, cover several topics normally covered in the first year of linear algebra (e.g., vector spaces, subspaces, nullspace, eigenvalues, singular values; see pp 461-462). As with most engineering books, there are no solutions provided.

An excellent supplement to other Linear Algebra textbooks. Given its focus on applications, will hold the interest of engineers and other technical folk but may not be loved by mathematicians who may prefer a more rigorous approach.

[+] herostratus101|7 years ago|reply

He does cover SVD in EE263, and all of the lectures for that course are freely available online.

[+] syntaxing|7 years ago|reply

Applied linear algebra is such a great idea. Linear algebra is relatively easy to understand and used everywhere. But the material is so damn boring since it's a lot of arthimetic. Even the homework problem is boring since there is no specific purpose.

[+] cultus|7 years ago|reply

Typical LA courses in math departments have a bizarre focus on being able to do Gaussian elimination by hand and stuff like that. It's not particularly useful or even mathematically interesting. LA courses would be so much more useful if they just stuck to theory and only had computer applications.

[+] chobytes|7 years ago|reply

Just finished my intro linear algebra class yesterday.

The class was a bit more abstract in nature, so some of the chapters in this look like they could be nice application oriented follow ups to it!

[+] herostratus101|7 years ago|reply

How much overlap is there between this and Stanford's EE263?

Can someone in the know comment on the differences between what is covered?

[+] vasili111|7 years ago|reply

Comment below: https://news.ycombinator.com/item?id=18678713

[+] unknown|7 years ago|reply

[deleted]

[+] wpmoradi|7 years ago|reply

Great Resource! But a good primer to Linear Algebra would be Gilbert Strangs course at MIT OCW.

[+] pylus|7 years ago|reply

Ijust flipped some pages and saw it is great book. Wished my school offered this.

[+] graycat|7 years ago|reply

I went though the slides. Super fun material! I've seen all the methods long ago, and much deeper than in the slides, and published on some of the most advanced material, and much more, but, still, it was fun material because of the many examples and really good graphs.

From their other books, clearly they are real experts. The slides, then, are a careful path where minimal theory gives a LOT of nice applications. The theory they give is nearly always so simple that they are able, in just a few lines, to give essentially the proofs, nearly always.

E.g., I never saw any mention of convexity, and these two guys are right at the top of experts on theory and applications of convexity, so that it is clear that they tried hard to get lots of applications from minimal theory.

They did next to nothing on numerical stability -- some mention might have been good.

There's a still easier derivation of the least squares normal equations based on perpendicular projections -- they might have included that. That is, if drop a golf ball to the floor, the line to the floor and the shortest distance to the floor is the line perpendicular to the floor. This fact generalizes.

They have illustrated a nice, general lesson: Can do such applications with just finite dimensions and/or discreteness. Can do more theory with continuous instead of discrete values and infinite instead of finite dimensions. But, then, even with the extra theory, often challenging, commonly for the computing are back to discreteness and finiteness. Sooooo, just omit the more advanced theory and just stay discrete and finite throughout -- that's one of the themes in the slides.

With this theme, the slides are able to do at least something interesting and potentially valuable from stacks of texts in pure and applied math, statistics, and more with just a few slides, simple math, nice graphs, and a few words. Nice.

E.g., they did a lot of applied statistics without mentioning probability theory! How'd they do that? They just stayed with the data and omitted describing the probabilistic context from which the data was samples or estimates. Cute. But, readers, be warned -- the probabilistic context should not be neglected; eventually should learn that, too.

Another cute omission of theory -- vector subspaces and the, really, the axioms of a vector space. E.g., that "floor" I mentioned above is such a subspace. How'd they do that? They just stayed with the basic example vector spaces they had in mind and managed to avoid talking about subspaces.

At one point they touched on determinants for the 2 x 2 case, mentioned that that result is important (should be remembered or some such), that there is a more general approach that don't have to remember!!! Determinants have some value here and there, e.g., show some continuity results right away and have some nice connections with volumes, but they are tricky to explain and CAN be omitted!!!

Uh, there is an easier proof of the Schwartz inequality based on Bessel's inequality. Since they did enough with orthogonality to do Bessel's inequality, they could have used that approach to the Schwartz inequality -- I first saw in P. Halmos.

They didn't make clear the close connections among inner products, covariance, and correlation -- maybe some readers will see those connections from what is in the slides.

They did the QT decomposition -- nice -- that is, for square matrix A, we can write A = QT where Q is orthogonal and T is triangular. They used that to solve systems of linear equations but omitted Gauss elimination and the associated approaches to numerical stability. For the Q, they emphasized the Gram-Schmidt process but neglected to mention that it's numerically unstable -- no wonder since are commonly subtracting large numbers whose difference is small, the basic sin in numerical analysis.

Of course, the authors are EE profs. Then it is interesting that another theme in the slides is getting close to much of the work in what computer science calls machine learning. E.g., their few slides on using classification to recognize digits 0-9 in handwriting is really cute, especially the graph that shows the sizes of the coefficients on top of the square that has the input data of hand written digits so that see which parts of the input data are the most relevant to the calculation. Cute.

Of course, there's much more to those fields that they omitted than included, but that's true also for even the best 5 star hotel luncheon buffet!!!

More fun stuff at

https://news.ycombinator.com/item?id=18648999

[+] Koshkin|7 years ago|reply

> determinants... CAN be omitted

Also see http://www.axler.net/DwD.html.

123 comments