Computational Linear Algebra

[+] thearn4|8 years ago|reply

Looks like a reasonable overview of applications of dense linear algebra operations, with very specific applications in mind.

I feel like iterative Krylov subspace methods should be around somewhere, but to be honest I'm not sure what the applications space for linear inverse problems looks like in the deep learning domain. So maybe that wouldn't quite fit (despite me finding it to be pretty cool, and these methods sort of eating the lunch of most other inverse methods).

I'd round out the course maybe with a read through of Trefethen-Bau, “Numerical Linear Algebra” for those new to the topic area.

[+] jph00|8 years ago|reply

Krylov subspace methods are heavily covered in the course. Trevathan is the main text we recommend in the course.

[+] chris_st|8 years ago|reply

So, what applications are iterative Krylov subspace methods good for? Can you give us a sense of where they're useful, and what (in general) inverse methods in this space are?

[+] jjtheblunt|8 years ago|reply

That book is superlative! We taught from it at Urbana-Champaign even 20 years ago in grad school..it's still fantastic.

[+] rabreu08|8 years ago|reply

Looks like a good course. I think it would benefit if they added some module on implementing some basic Linear system of equations solvers, like gradient or steepest descent. Or even GMRES/MINRES or so.. The amout of knowledge that i gained from trying to implement these was remarkable.

[+] jph00|8 years ago|reply

Gradient descent is covered. We even show a pytorch implementation :)

[+] thearn4|8 years ago|reply

GMRES/Krylov spaces seem like they ought to have more of a place in deep learning. But maybe thats the mathematical part of me having a hammer and wanting to see everything as a nail.

One possibility that I haven't seen: if you have a neural network graph with a small subgraph that is not 100% explicit (i.e. I have nodes with cyclic dependency that have to be solved collectively via newton-type method), the gradient across this subgraph can be solved at the converged state by solving a standard matrix-vector linear inverse problem applied to the gradients, without needed the AD engine to follow every operation through the iterates of the non-linear newton solver.

For sparsity and conditioning reasons, in my current work we do something like this for graph-based specification of engineering systems using GMRES to find the gradients across these subsystems. We're not doing deep learning, but the underlying machinery is basically the same.

[+] fdej|8 years ago|reply

> Locality: traditional runtime computations focus on Big O, the number of operations computed. However, for modern computing, moving data around in memory can be very time-consuming

I need to nitpick here... Big O notation is a way to describe growth rates of functions. You can count data movements (or anything else) with Big O.

[+] vog|8 years ago|reply

Moreover, "moving data around in memory can be very time-consuming" means that in the end, it is still about time, not about memory.

So the correct way would be to translate memory access to time, but that means modelling the memory hierarchy, modelling caches in general, and finally perhaps modelling the specifically used caching strategies.

[+] halflings|8 years ago|reply

Candid question: isn't the growth rate of functions the derivative of the Big O complexity instead? O(1) does not mean that the growth rate is constant. It means that the number of operations is constant, and its derivative being zero, that means the growth is nil.

[+] GregBuchholz|8 years ago|reply

Anyone have a comparison to the "Code the Matrix" book and lectures?

http://codingthematrix.com/

https://cs.brown.edu/video/channels/coding-matrix-fall-2014/

[+] lemming|8 years ago|reply

For someone with an ancient undergrad math background and only "interested observer" level of machine learning knowledge, would it be better to do this course before tackling the deep learning one?

[+] wodenokoto|8 years ago|reply

No, start with fast.ai's deep learning course. They have a top down approach so you start with connecting layers to solve deep learning problems and then dig down into the theory behind the thing.

If you want a bottom up approach, go look at Ng's coursera course.

http://course.fast.ai

[+] will_pseudonym|8 years ago|reply

If I wanted to learn linear algebra, I would reinvent PageRank.

[+] colorincorrect|8 years ago|reply

as a casual on-looker, no. this is a very practical tutorial, and you should be reading other things depending on what you are interested about in ML

[+] flor1s|8 years ago|reply

Thanks for sharing this, it seems like a lot of interesting material is being discussed. The audience seems to be more like the hacker news visitor than the average student though, as it feels like little hand holding is provided.

I've just started lecture 1 but I already felt some minor frustrations:

- One of the links in the first lecture is to a notebook about intro to convolutions but that notebook is just a big code dump.

- After executing the exercises, you lose the expected answer. It might be better if the answers were included as a comment in the code fragment.

- Sometimes the given answers are not actually the answer but just the computation performed as part of getting the answer. I.e. for the matrix-matrix products section in lecture 1 the suggested answer is just the resulting matrix from doing the matrix product, but according to the question in the text the answer should be the actual cheapest shop.

- Is this a USF course or a fast.ai course?

I don't know if the author is planning on improving the material, because right now it feels a bit like a beta version.

[+] ceyhunkazel|8 years ago|reply

Top-down approach is the best approach to teach and learn well done!

[+] unknown|8 years ago|reply

[deleted]

[+] will_pseudonym|8 years ago|reply

I'm hooked by the title and excited by teaching people about the linear algebra gospel. Powering search engines forever.

[+] unityByFreedom|8 years ago|reply

I'm excited to do this after I get through as much of the DL course as I can. Maybe that's a bit backwards but whatever.

Thanks for your hard work, Rachel! Really curious what you two will get up to next.

[+] MarkMMullin|8 years ago|reply

Nice - I was taken by the "acceptable accuracy" comment, much less stomach acid inducing than "numerical stability" :-)

[+] taw55|8 years ago|reply

Would one get much out of this course with only minimal ML background?

[+] jph00|8 years ago|reply

Yes it doesn't directly cover ML much. It does assume some basic linear algebra, but provides links for where to learn that quickly.

[+] kyrre|8 years ago|reply

Why 'computational' and not 'numerical'?

[+] dom0|8 years ago|reply

Probably to emphasize that it is a practical course. Numerical courses, at least those I've seen or completed, tend to focus on algorithms and their properties, not implementation.

> Jeremy and I developed this material for a numerical linear algebra course we taught in the University of San Francisco’s Masters of Analytics program, and it is the first ever numerical linear algebra course, to our knowledge, to be completely centered around practical applications and to use cutting edge algorithms and tools,

[+] stuartaxelowen|8 years ago|reply

... What part of linear algebra isn't computational?

[+] geokon|8 years ago|reply

Like... most of it? You can't do linear algebra "safely" without doing error analysis. So lots of decompositions and operations are very useful for proofs and finding bounds - but can't be use directly to compute values. It's why a good fraction of stuff people do with linear algebra is numerically garbage

[+] mmmmmmmike|8 years ago|reply

Many theorems let you know that something exists but don't tell you how to compute it efficiently (an orthogonal basis, eigenvalues, inverses, etc.)

Also, sometimes algorithms can be invented and empirically shown to have good complexity properties before there are proofs.

[+] zodiac|8 years ago|reply

I found the part which use Zorn's Lemma pretty un-computational :) It's equivalent to the axiom of choice and is used to show that every vector space has a basis

[+] fizixer|8 years ago|reply

Depends on what you mean by 'computational'.

- Computational as in 'using programming and computers'

- Computational as in 'solving a math problem' (as opposed to 'theoretical': definitions/theorems/proofs/lemmas).

My guess is that the posted link means the first kind, hence almost all of linear algebra texts are non-computational, i.e., you can become an expert in linear algebra without knowing how to program, and without knowing a single programming language.

For the second kind, most of beginner and intermediate linear algebra is computational, but not all. There's plenty of theory of linear algebra, and it has connections with representation theory, abstract algebra, as well as analysis, topology, and geometry. Study of infinite-dimensional vector spaces is purely non-computational.

[+] dbranes|8 years ago|reply

One of my current favorite things: categorification of linear algebra via higher category theory: https://qchu.wordpress.com/2016/05/31/higher-linear-algebra/

[+] unknown|8 years ago|reply

[deleted]

[+] unknown|8 years ago|reply

[deleted]

[+] ianai|8 years ago|reply

I've been wanting to do a math refresher in linear or modern algebra for a while...tempting.

55 comments