> As soft prerequisites, we assume basic comfortability with linear algebra/matrix calc [...]
>
That's a bit of an understatement. I think anyone interested in learning ML should invest the time needed to deeply understand Linear Algebra: vectors, linear transformations, representations, vector spaces, matrix methods, etc. Linear algebra knowledge and intuition is key to all things ML, probably even more important than calculus.
> I think anyone interested in learning ML should invest the time needed to deeply understand Linear Algebra: vectors, linear transformations, representations, vector spaces, matrix methods, etc. Linear algebra knowledge and intuition is key to all things ML, probably even more important than calculus.
To play devil's advocate, (EDIT: an intuitive understanding of) probabilistic reasoning (probability theory, stochastic processes, Bayesian reasoning, graphical models, variational inference) might be equally if not more important.
The emphasis on linear algebra is an artifact of a certain computational mindset (and currently available hardware), and the recent breakthroughs with deep neural networks (tremendously exciting, but modest success, in the larger scheme of what we wish to accomplish with machine learning). Ideas from probabilistic reasoning might well be the blind spot that's holding back progress.
Further, for a lot of people doing "data science" (and not using neural networks out the wazoo) I think that they can abstract away several linear algebra based implementation details if they understand the probabilistic motivations -- which hints at the tremendous potential for the nascent area of "probabilistic programming".
Awesome. I had a really difficult time with math in HS, and never pursued it at all in college, so even though I'm a programmer my math skills are barely at a high school level.
I'd love to get into ML but the math keeps me at bay.
I think a lot of people need to start from the basics because they don't have a good foundation in math. The core problem is schools will push you along if you can somehow produce the correct answer for 70% of the problems on a test. Combine this with intense pressure not to fail and you will very likely end up in higher level math courses with many gaping holes in your foundational knowledge. You thus end up relying on tricks and memorization rather than useful understanding. Here is a TED talk where Sal Khan of Khan Academy talks about this: https://www.youtube.com/watch?v=-MTRxRO5SRA
After struggling to understand advanced math in a lot different contexts I decided to go through the entire K-12 set of exercises on Khan Academy. I blazed through the truly elementary stuff like counting and addition in a few hours, but I was suprised at how quickly my progress started slowing down. I found I could not solve problems involving negative numbers with 100% accuracy. Like (5 + (-6) - 4). I would get them right probably 90% of the time but the thing is Khan Academy doesn't grant you the mastery tag unless you get them right 100% of the time. I found most of my problems were due to sloppy mental models. Like, I didn't understand how division works -- if someone were to ask me what (3/4) / (5/6) even means conceptually I would not have been able to provide a coherent, accurate explanation. "Uh... it's like taking 5/6 of 3/4... wait no that's multiplication... you need to flip the second fraction over... for some reason..." It was around the 8th grade level that I found myself having to actually work hard. (What does Pi even mean?) And I've been through advanced Calculus courses at the university level.
> Like, I didn't understand how division works -- if someone were to ask me what (3/4) / (5/6) even means conceptually I would not have been able to provide a coherent, accurate explanation. "Uh... it's like taking 5/6 of 3/4... wait no that's multiplication... you need to flip the second fraction over... for some reason..."
In case you (or others reading this) still struggle to formalize division, a very nice way to conceptualize it is as the inverse of multiplication. This neatly sidesteps the problem of trying to figure out a clean analogue for what it means to to multiply a fraction of something by another fraction of something, since the intuitive group-adding idea of multiplication sort of breaks down with ratios.
Addition is a straightforward operation, but subtraction is trickier. For all real x there exists an additive inverse -x satisfying x + (-x) = 0. So to subtract 3 from 4 we instead take the sum 4 + (-3) = 1.
Likewise to multiply 3 by 4 we add four groups of 3: 3 + 3 + 3 + 3 = 12. We accomplish division by using a multiplicative inverse: for all real x there exists a 1/x such that x(1/x) = 1.
So (3/4) / (5/6) is equal to (3 * 1/4) / (5 * 1/6). In other words, take the multiplicative inverse of 4 and 6 and multiply them by 3 and 5 respectively. Then multiply the first product by the inverse of the second product.
This is the axiomatic basis of division as "repeated subtraction": subtraction is the sum of a number and another number's additive inverse, and multiplication is repeated addition. Then division is the product of a number and another number's multiplicative inverse. From this perspective you need not even understand division computationally if all you'll ever deal with are fractions and not decimals.
I had a similar Khan Academy experience. What caught my attention was how much more relevant everything was because I had a wealth of work and life experiences that made the concepts much more relevant and applicable than they might have been when I was in HS.
This is excellent. Thank you for taking the time to write it.
I don't know what is it about math -- especially when it involves manipulation of symbols as opposed to pictures or lay language -- that turns off so many people.
The fact that so many software developers "don't like math" is ironic, because they're perfectly happy to manipulate symbols such as "x", "file", or "user_id" that stand in for other things every day. The entirety of mathematical knowledge is very much like a gigantic computer language (a formal system) in which every object is and must be precisely defined in terms of other objects, using and reusing symbols like "x", "y", "+", etc. that stand in for other things.
Perhaps the issue is motivation? Many wonder, "why do I need to learn this hard stuff?" If so, the approach taken by Rachel Thomas and Jeremy Howard at fast.ai seems to be a good one: build things, and then fill the theoretical holes as needed, motivated by a genuine desire to understand.
> I don't know what is it about math -- especially when it involves manipulation of symbols as opposed to pictures or lay language -- that turns off so many people.
The biggest turn off about math is the way people are taught math.
Most people are taught math as if it's an infinite set of cold formulas to memorize and regurgitate. Most students in my statistics class didn't know where and when to use the formulas taught in real life; they only knew enough to pass the tests. Students who obtain As in Algebra 2 hardly know where the quadratic formula comes from (and what possibly useful algebraic manipulation could you do if you can't even rederive the quadratic formula?). It's not just math, I've been in a chemistry class where the TA was getting a masters in chemistry and yet she taught everyone in my class a formula so wrong that if interpreted meant that everytime a photon hits an atom, then an electron will be ejected with the same energy and speed as the photon. This is obviously wrong but when I pointed it out, everyone thought I was wrong because "that's not what it says in the professor's notes" (later, the professor corrected their notes). In my physics class, the people who struggled the most are the ones who tried the least to truly grasp where the formulas come from. I don't blame them, it's the way most schools teach.
> build things, and then fill the theoretical holes as needed, motivated by a genuine desire to understand.
I totally agree.
Source:
My experience with tutoring people struggling with math for the past eight years. I used to like math then I got to college where 95% of people don't understand the math they're doing and thus can't be creative with it; this includes the professors who teach math as the rote memorization of formulas. Yeah, call me arrogant, but I have found it to be true in my experience. I strongly believe the inability to rederive or truly grasp where things come from destroys the ability to be creative and leads to a lack of true understanding. But everyone believes they understood the material because they got an A on the exam. I'll stop ranting on this now.
The impression I get is that many people want to be system designers stringing together pieces to create systems to solve problems (part of the motivation might be that it is easier to extract economic value from such integrated solutions, rather than better functioning pieces).
The problem is that in an immature field that's still evolving, the components are not yet well-understood or well-designed, so available abstractions are all leaky. However, modern software engineering is mostly built on the ability to abstract away enormous complexity behind libraries, so that a developer who is plumbing/composing them together can ignore a lot of details [1]. People with that background now expect similarly effective abstractions for machine learning, but the truth is that machine learning is simply NOT at that level of maturity, and might take decades to get there. It is the price you pay for the thrill of working in a nascent field doing something genuinely uncharted.
"Math in machine learning" is a bit of a red herring. We hear the same complaints about putting in effort to grok ideas in functional programming, thinking about hardware/physics details, understanding the effects of software on human systems [2], etc. Fundamentally, I think a lot of people have not developed the skill to fluidly move between different levels of abstraction, and a variety of approximately correct models. And to be fair, it seems like most of software engineering is basically blind to this, so one can't shift all the blame on individuals.
I don't know what is it about math -- especially when it involves manipulation of symbols as opposed to pictures or lay language -- that turns off so many people.
I can tell you at least part of it, from my subjective perspective. I tend to "think" in a very verbal fashion and I instinctively try to sub-vocalize everything I read. So when I see math, as soon as I see a symbol that I can't "say" to myself (eg, a greek letter that I don't recognize, or any other unfamiliar notation) my brain just tries to short-circuit whatever is going on, and my eyes want to glaze over and jump to the stuff that is familiar.
OTOH, with written prose, I might see a word I don't recognize, but I can usually work out how to pronounce it (at least approximately) and I can often infer the meaning (at least approximately) from context. So I can read prose even when bits of it are unfamiliar.
There's also the issue that math is so linear in terms of dependencies, and it's - in my experience - very "use it or lose it" in terms of how quickly you forget bits of it if you aren't using it on day-in / day-out basis.
> The fact that so many software developers "don't like math" is ironic, because they're perfectly happy to manipulate symbols such as "x", "file", or "user_id" that stand in for other things every day. The entirety of mathematical knowledge is very much like a gigantic computer language (a formal system) in which every object is and must be precisely defined in terms of other objects, using and reusing symbols like "x", "y", "+", etc. that stand in for other things.
I don’t find it ironic, because I wouldn’t expect engineers to make good mathematicians implicitly (nor vice versa). There is some similarity between math and programming, but there is also a collossal amount of dissimilarity that makes them different things entirely.
For example, notation and terminology in mathematics is not actually rigorous. It’s highly context dependent and frequently overloaded (take the definition of “normal”, the notation of a vector versus a closure, or the notation of a sequence versus a collection of sets). As another example, consider that beyond the first few courses of undergraduate math you’re wading into a sea of abstraction which you can only reason about. There is no compiler flag to ensure your proof is correct in the general case, and you don’t have good, automatic feedback on whether or not the math works. In this sense, the entirety of mathematical knowledge is actually very much not like a formal computer language.
Beyond that, the ceiling of complexity for theoretical computer science or applied mathematics is far higher than programming. It’s not so much motivation (though that can be an issue too), it’s that learning the mathematics for certain things simply takes a vast amount of time. Meanwhile a professional programmer has to become good at things that mathematicians and scientists don’t have to care about, like version control or the idiosyncrasies of a specific language.
They're really orthogonal disciplines, for much the same reason that engineering isn't like computer science. There is a world of difference between proving the computational complexity of an algorithm and implementing an algorithm matching that complexity in the real world.
> The fact that so many software developers "don't like math" is ironic, because they're perfectly happy to manipulate symbols such as "x", "file", or "user_id" that stand in for other things every day.
`user_id` says what it is; something like `β` does not. It's more like reading minified JavaScript than literate programming. Math notation is frequently horribly overloaded and needlessly terse.
Math education from the undergraduate level on is fairly horrible and not communicated well. Just go read the typical calculus textbook and realize that they reference a lot of stuff that no pre-calculus student would typically know, such as proof by induction, lemmas and so on. The textbooks are written to the professors, not the actual students.
Various non-intuitive concepts are handwaved, the foundations skipped over and students then start struggling because they don't understand the foundation of what they are trying to learn. Reading from the textbook is fairly useless and it ends up being used as a problem set source.
I argued to a few math professors about teaching things like calculus with the textbooks referencing concepts that were not actually taught until 5 classes later is a bad idea.
In return I got a shrug of indifference telling me that's just the status quo and the status quo is OK.
Well, I think the difference between developer symbols and math symbols is that developer symbols are lot more google-able. I can google a line of code, but it's hard to google a crap ton of greek letters and summations. Even if I did manage to parse it into a google search somehow, I probably wouldn't get any meaningful results.
Also, for me personally, it's just such a drag to learn all the notation. After the fact, I've always thought, "Wow, that's all this means?" but while I'm learning, I feel helpless. It doesn't feel like I have any way to google it. My professors never actually want to sit down and explain it to me. All the pages of math equations always look so intimidating. It's just such a drag.
I don’t enjoy math and I simply don’t have the intuition for it. Every time I attempt to do math in my head my brain groans and says “It’s the 21st century, jackass. Use a calculator.”
I do, however, have a talent for language.
The reason I am a good developer is because I can communicate with different machines through different programming languages in the same way I can communicate with different people through different human languages.
I have tried as of late to learn math in an attempt to contextualize it as language - the language of the universe, really - but it is far more of an uphill climb for me than JavaScript or Chinese.
I don't think we "don't like math", but in my case, I just need an accelerated version of the "math" that I need without the deep-dive.
Here's a crazy idea that machine learning might one day help with software engineers understanding algorithms and data structures.
You write some code to traverse a list or something and do some naive sorting, or maybe you're "everyday way of doing some operations on your lists is inefficent". I want some cool machine learning where I can submit my code and it does analysis.
I'm kinda curious why so many people think that Linear Algebra Done Right is an introductory book for beginners who have math anxiety. Don't get me wrong, the book is great and I enjoyed working it through. It was a magical experience when I saw how simple it was to prove some seemingly hard theorems by just linking the right definitions and theorems. That said, the book does require certain level of math maturity as it achieves its elegance by staying at certain level of abstraction and its style is quite formal, so much so that a person who can use this book as its first linear algebra textbook shouldn't have math anxiety at all.
Speaking as one of the people who recommended it in this thread: I don't think math anxiety is the right focus for which textbook to choose. More precisely, I don't think you should try to solve that problem by getting a different linear algebra textbook. To put it bluntly, someone with math anxiety probably just doesn't have the mathematical maturity for linear algebra yet. In that case they'd be doing themselves a disservice by attempting the material using some sort of "more accessible" book; instead, they should focus on resolving that anxiety through developing a solid foundation in the prerequisite material.
Linear Algebra is typically the first course in which students have to transition from predominantly rote computation to proof-based theory. Axler's Linear Algebra Done Right is very often the textbook used for that course because it (mostly [1]) lives up to its name. This isn't Math 55: compared to Rudin and Halmos, Axler is a very accessible introduction to linear algebra for those who are ready for linear algebra. The floor for understanding this subject doesn't doesn't get much lower than Axler (and in my opinion, it doesn't get much better at the undergraduate level either).
It's unfortunate that so many people want to skip to math they're not ready for, because there's no shame in building up to it. A lot of frustration can be eliminated by figuring out what you're actually prepared for and starting from there. If that means reviewing high school algebra then so be it; better to review "easy" material than to bounce around a dozen resources for advanced material you're not ready for.
For me, it really was the first time math had clicked. I had a non-proof based linear algebra course before going through the book, but it made very little sense to me. After doing LADR, I understood the subject intimately, lost my math anxiety, and performed better in every class I took afterwards than I would have otherwise.
As a teenager I thought I was bad at math and even went to get a film studies/communications bachelors (I've defended this week my masters thesis in computational mathematical physical; this after an undegraduare degree in econ with lots of math of course).
The thing is, I couldn't write the damn matrices well lined up and made mistakes when doing calculations. This was really a (de)formative experience. In college Linear Algebra for econ was 40% gaussian elimination, 40% eigenvalues and 20% linear programming. I mean, I still can't do gausian elimination by hand right.
I started crawling out of it when I started seeing (in self-study) a book on linear algebra that takes the linear transform/vector space-first approach.
My bullet list, which might be too ambitious and theory-focused, but this is what I used from my physics background.
Learn some:
Calc up to 3 (you can skip some of the divergence and curl stuff)
Linear algebra (no need for Jordan change of basis)
Real analysis
Intermediate probability theory (MAE, MAP, conjugate priors minus the measure theory stuff)
A little bit of differential geometry (at least geodesics. This is for dimension reduction)
Discrete math (know counting and sums really well)
Learn a little bit of Physics (at least know Lagrangians and Hamiltonians)
A little bit of complex analysis (to know contour integration and fourier/laplace transforms)
Some differential equations (up to Frobenius and wave equations)
Some graph theory (my weak spot, but I have used the matrix representations a few times)
After all that, read some Kevin Murphy and Peter Norvig.
Congrats, now you can read most machine learning papers. The above will also give you the toolkit to learn things as they come up like Robbins-Monro.
OP's article is much better if you are trying to be a ML developer/practitioner. Like I said, this list might be too theory focused, but it lets me read lots of applied math papers that aren't ML focused.
For those who don't know, please check out 3blue1brown videos on youtube for a better understanding of concepts like Linear algebra and other things required for machine learning. Thank me later.
This is something we're striving hard to do at the startup I'm involved with (end-to-end resources for learning machine learning, with just high school math background assumed).
In our Data Scientist Track (https://www.dataquest.io/path/data-scientist?), I specifically focused on teaching K-nearest neighbors first b/c it has minimal math but you can still teach ML concepts like cross-validation, and then I wrote Linear Algebra and Calculus courses before diving into Linear Regression.
I recommend the No Bullshit books for anyone with no real math background past trig to get their feet wet, and/or anyone who hasn't done any serious math study for years.
Thanks for posting this!! Was actually searching for this the other day here on HN and found a link to the https://github.com/mml-book/mml-book.github.io. haven't checked it out yet but the links in the OP look solid.
Re: PCA vs. tSNE. I don't know much about tSNE, but if it is a "manifold learning method" as the sklearn docs say, you could try something like LTSA instead:
Then, it's not difficult to understand what a manifold is, but it took me a number of attempts to get it, and then I only did when studying them formally with Spivak 1963. Now the concept of manifold seems patently obvious to me and not really needing much formalization, but...
Just wondering if someone had a similar experience: I absolutely loved Math in school, zipped through the classes, always one of the best.
Then things changed at university (studying computer science) and I completely lost interest. Not sure why (bad teacher, going from being best in class to being average, the math at uni different from school).
Now, much later, I regret not having followed through and miss the beauty of Math. I'm re-discovering it and wondering how I could use more of it in my work.
>A student’s mindset, as opposed to innate ability, is the primary predictor of one’s ability to learn math (as shown by recent studies).
The article seems good overall, but I only skimmed the rest after seeing a citation of a 5-year-old Atlantic article describing disputed and at minimum highly exaggerated findings presented as 'shown in recent studies'.
I really want a shallow-dive into machine learning and I know I need linear-algebra as a foundation. I would love an interactive course in linear algebra where we could input matrices and see some visual stuff with animations.
Does anyone have suggestions on learning resources for matrix calculus? I'm trying to come up to speed with the topic and could use pointers to worked examples, video lectures, etc.
If you don’t care about accreditation and are patient, sit down with Axler’s Linear Algebra Done Right and Hoffman & Kunze’s Linear Algebra, in that order.
I would caution you against trying to learn linear algebra using a “take what you need” approach. A random walk approach to learning the material is faster than an accumulation approach, but it’s more brittle and prone to confusion. A lot of things which appear to be irrelevant or unnecessary for machine learning (computation or research) can be imperative for understanding or implementing much more complex concepts later on.
I like "Coding the Matrix" by Philip Klein of Brown delivered via Coursera. It's a deep content intro to linear algebra (and more), with a focus on applications in computer science. The course is accompanied by a textbook written by Klein, which makes the course material better organized and more in-depth than slides and videos alone would allow.
It is within the context of distributions or generalized functions (https://en.wikipedia.org/wiki/Distribution_(mathematics)) but people are often loose on the terminology and tend to just use the term "functions". It's a wonderful topic, with a lot of interesting applications in differential equations and physics.
I just found a quick explanation by Terence Tao about why people are generally loose in this case, meaning that some properties transition nicely from smooth (here, differentiable) top the rough categories by passing to the limit and density arguments: http://www.math.ucla.edu/~tao/preprints/distribution.pdf
It’s differential everywhere except at x=0. At x=0 it actually has a subdifferential—think of it as the set of slopes of lines that are tangent at that point.
There machine learning (ML) is basically a lot of empirical curve fitting. The context is usually with a lot of data, thousands of variables, millions or billions of data points, observations, pairs of values of thousands of independent variables and the value of the corresponding dependent variable. The work is all a larger, more data, version of: You have a high school style X-Y coordinate system and some points plotted there. So, you want to find values for coefficients a and b so the line
y = ax + b
fits the points as well as possible. But, you can do variations, try to fit, say,
log(y) = a sin(x) + b
Or replace log or sin with any functions you want and try again.
The logic, rational support, is essentially as follows: So, take, say, 1000 x-y pairs. Partition these into 500 training data and 500 test data. Find the best fit you can, using whatever fits, to the training data. Then take the equation and see how well it fits the test data. If the fit of the test data is also good, then that is your model.
Now you want to apply the model in practice, apply the model to data did not see in the given 1000 points. So for the application, will be given a value of x, plug it into the equation, and get the corresponding value of y. That's what you want -- maybe the value of y gives you Y|N for ad targeting, Y|N cancer, what MSFT will be selling for next month, what the revenue will be for next year, etc.
The rational, logical justification here is an assumption (which should have some justification from somewhere) that the x you are given and the y you want for that value of x is sufficiently like the x-y values you had in the original 1000 points.
Okay. Empirical curve fitting to a lot of data to make a predictive model, that is found with training data, tested with test data, and applied where the given data in the application is like the data used in the fitting.
The OP mentions that some people believe that to make progress to real machine intelligence, need more math than what I outlined.
My guess is that to make that intended progress, for all but some tiny niche cases, first need some much more powerful and quite different ideas, techniques, etc. than in the curve fitting ML I outlined.
Yes, there is a chance that with lots of data from working brains and lots of such empirical fitting we will be able to find some fits that will uncover some of the workings of the brain crucial for real intelligence. Uh, that's a definite maybe!
But there is a lot more to what can be done to build predictive models than such curve fitting, empirical or otherwise. I outlined some such in the thread that I referenced above.
So, for the question in the OP, what math? Well, if want to pursue directions other than the empirical curve fitting in the Bloomberg course I referenced above, my experience is -- quite a lot. For the education, start with a good undergraduate major in pure math. So, cover the usual topics, calculus, abstract algebra, linear algebra, differential equations, advanced calculus, probability, statistics. Then continue with more in algebra, analysis, and geometry.
ivansavz|7 years ago
> As soft prerequisites, we assume basic comfortability with linear algebra/matrix calc [...] >
That's a bit of an understatement. I think anyone interested in learning ML should invest the time needed to deeply understand Linear Algebra: vectors, linear transformations, representations, vector spaces, matrix methods, etc. Linear algebra knowledge and intuition is key to all things ML, probably even more important than calculus.
Book plug: I wrote the "No Bullshit Guide to Linear Algebra" which is a compact little brick that reviews high school math (for anyone who is "rusty" on the basics), covers all the standard LA topics, and also introduces dozens of applications. Check the extended preview here https://minireference.com/static/excerpts/noBSguide2LA_previ... and the amazon reviews https://www.amazon.com/dp/0992001021/noBSLA#customerReviews
ssivark|7 years ago
To play devil's advocate, (EDIT: an intuitive understanding of) probabilistic reasoning (probability theory, stochastic processes, Bayesian reasoning, graphical models, variational inference) might be equally if not more important.
The emphasis on linear algebra is an artifact of a certain computational mindset (and currently available hardware), and the recent breakthroughs with deep neural networks (tremendously exciting, but modest success, in the larger scheme of what we wish to accomplish with machine learning). Ideas from probabilistic reasoning might well be the blind spot that's holding back progress.
Further, for a lot of people doing "data science" (and not using neural networks out the wazoo) I think that they can abstract away several linear algebra based implementation details if they understand the probabilistic motivations -- which hints at the tremendous potential for the nascent area of "probabilistic programming".
tenaciousDaniel|7 years ago
I'd love to get into ML but the math keeps me at bay.
durpleDrank|7 years ago
jimmy1|7 years ago
JesseAldridge|7 years ago
After struggling to understand advanced math in a lot different contexts I decided to go through the entire K-12 set of exercises on Khan Academy. I blazed through the truly elementary stuff like counting and addition in a few hours, but I was suprised at how quickly my progress started slowing down. I found I could not solve problems involving negative numbers with 100% accuracy. Like (5 + (-6) - 4). I would get them right probably 90% of the time but the thing is Khan Academy doesn't grant you the mastery tag unless you get them right 100% of the time. I found most of my problems were due to sloppy mental models. Like, I didn't understand how division works -- if someone were to ask me what (3/4) / (5/6) even means conceptually I would not have been able to provide a coherent, accurate explanation. "Uh... it's like taking 5/6 of 3/4... wait no that's multiplication... you need to flip the second fraction over... for some reason..." It was around the 8th grade level that I found myself having to actually work hard. (What does Pi even mean?) And I've been through advanced Calculus courses at the university level.
throwawaymath|7 years ago
In case you (or others reading this) still struggle to formalize division, a very nice way to conceptualize it is as the inverse of multiplication. This neatly sidesteps the problem of trying to figure out a clean analogue for what it means to to multiply a fraction of something by another fraction of something, since the intuitive group-adding idea of multiplication sort of breaks down with ratios.
Addition is a straightforward operation, but subtraction is trickier. For all real x there exists an additive inverse -x satisfying x + (-x) = 0. So to subtract 3 from 4 we instead take the sum 4 + (-3) = 1.
Likewise to multiply 3 by 4 we add four groups of 3: 3 + 3 + 3 + 3 = 12. We accomplish division by using a multiplicative inverse: for all real x there exists a 1/x such that x(1/x) = 1.
So (3/4) / (5/6) is equal to (3 * 1/4) / (5 * 1/6). In other words, take the multiplicative inverse of 4 and 6 and multiply them by 3 and 5 respectively. Then multiply the first product by the inverse of the second product.
This is the axiomatic basis of division as "repeated subtraction": subtraction is the sum of a number and another number's additive inverse, and multiplication is repeated addition. Then division is the product of a number and another number's multiplicative inverse. From this perspective you need not even understand division computationally if all you'll ever deal with are fractions and not decimals.
thanatropism|7 years ago
I applaud your counter-Dunning-Krugerish inquisitiveness about your own skills. I hope some of that rubs on me.
shostack|7 years ago
cs702|7 years ago
I don't know what is it about math -- especially when it involves manipulation of symbols as opposed to pictures or lay language -- that turns off so many people.
The fact that so many software developers "don't like math" is ironic, because they're perfectly happy to manipulate symbols such as "x", "file", or "user_id" that stand in for other things every day. The entirety of mathematical knowledge is very much like a gigantic computer language (a formal system) in which every object is and must be precisely defined in terms of other objects, using and reusing symbols like "x", "y", "+", etc. that stand in for other things.
Perhaps the issue is motivation? Many wonder, "why do I need to learn this hard stuff?" If so, the approach taken by Rachel Thomas and Jeremy Howard at fast.ai seems to be a good one: build things, and then fill the theoretical holes as needed, motivated by a genuine desire to understand.
hmmm5|7 years ago
The biggest turn off about math is the way people are taught math.
Most people are taught math as if it's an infinite set of cold formulas to memorize and regurgitate. Most students in my statistics class didn't know where and when to use the formulas taught in real life; they only knew enough to pass the tests. Students who obtain As in Algebra 2 hardly know where the quadratic formula comes from (and what possibly useful algebraic manipulation could you do if you can't even rederive the quadratic formula?). It's not just math, I've been in a chemistry class where the TA was getting a masters in chemistry and yet she taught everyone in my class a formula so wrong that if interpreted meant that everytime a photon hits an atom, then an electron will be ejected with the same energy and speed as the photon. This is obviously wrong but when I pointed it out, everyone thought I was wrong because "that's not what it says in the professor's notes" (later, the professor corrected their notes). In my physics class, the people who struggled the most are the ones who tried the least to truly grasp where the formulas come from. I don't blame them, it's the way most schools teach.
> build things, and then fill the theoretical holes as needed, motivated by a genuine desire to understand.
I totally agree.
Source: My experience with tutoring people struggling with math for the past eight years. I used to like math then I got to college where 95% of people don't understand the math they're doing and thus can't be creative with it; this includes the professors who teach math as the rote memorization of formulas. Yeah, call me arrogant, but I have found it to be true in my experience. I strongly believe the inability to rederive or truly grasp where things come from destroys the ability to be creative and leads to a lack of true understanding. But everyone believes they understood the material because they got an A on the exam. I'll stop ranting on this now.
ssivark|7 years ago
The problem is that in an immature field that's still evolving, the components are not yet well-understood or well-designed, so available abstractions are all leaky. However, modern software engineering is mostly built on the ability to abstract away enormous complexity behind libraries, so that a developer who is plumbing/composing them together can ignore a lot of details [1]. People with that background now expect similarly effective abstractions for machine learning, but the truth is that machine learning is simply NOT at that level of maturity, and might take decades to get there. It is the price you pay for the thrill of working in a nascent field doing something genuinely uncharted.
"Math in machine learning" is a bit of a red herring. We hear the same complaints about putting in effort to grok ideas in functional programming, thinking about hardware/physics details, understanding the effects of software on human systems [2], etc. Fundamentally, I think a lot of people have not developed the skill to fluidly move between different levels of abstraction, and a variety of approximately correct models. And to be fair, it seems like most of software engineering is basically blind to this, so one can't shift all the blame on individuals.
[1] Why the MIT CS curriculum moved away from Scheme towards Python -- https://www.wisdomandwonder.com/link/2110/why-mit-switched-f...
[2] Building software through REPL-it-till-it-works leads to implicitly ignoring important factors (such as ethics) -- https://news.ycombinator.com/item?id=16431008
mindcrime|7 years ago
I can tell you at least part of it, from my subjective perspective. I tend to "think" in a very verbal fashion and I instinctively try to sub-vocalize everything I read. So when I see math, as soon as I see a symbol that I can't "say" to myself (eg, a greek letter that I don't recognize, or any other unfamiliar notation) my brain just tries to short-circuit whatever is going on, and my eyes want to glaze over and jump to the stuff that is familiar.
OTOH, with written prose, I might see a word I don't recognize, but I can usually work out how to pronounce it (at least approximately) and I can often infer the meaning (at least approximately) from context. So I can read prose even when bits of it are unfamiliar.
There's also the issue that math is so linear in terms of dependencies, and it's - in my experience - very "use it or lose it" in terms of how quickly you forget bits of it if you aren't using it on day-in / day-out basis.
throwawaymath|7 years ago
I don’t find it ironic, because I wouldn’t expect engineers to make good mathematicians implicitly (nor vice versa). There is some similarity between math and programming, but there is also a collossal amount of dissimilarity that makes them different things entirely.
For example, notation and terminology in mathematics is not actually rigorous. It’s highly context dependent and frequently overloaded (take the definition of “normal”, the notation of a vector versus a closure, or the notation of a sequence versus a collection of sets). As another example, consider that beyond the first few courses of undergraduate math you’re wading into a sea of abstraction which you can only reason about. There is no compiler flag to ensure your proof is correct in the general case, and you don’t have good, automatic feedback on whether or not the math works. In this sense, the entirety of mathematical knowledge is actually very much not like a formal computer language.
Beyond that, the ceiling of complexity for theoretical computer science or applied mathematics is far higher than programming. It’s not so much motivation (though that can be an issue too), it’s that learning the mathematics for certain things simply takes a vast amount of time. Meanwhile a professional programmer has to become good at things that mathematicians and scientists don’t have to care about, like version control or the idiosyncrasies of a specific language.
They're really orthogonal disciplines, for much the same reason that engineering isn't like computer science. There is a world of difference between proving the computational complexity of an algorithm and implementing an algorithm matching that complexity in the real world.
jdminhbg|7 years ago
`user_id` says what it is; something like `β` does not. It's more like reading minified JavaScript than literate programming. Math notation is frequently horribly overloaded and needlessly terse.
woolvalley|7 years ago
Various non-intuitive concepts are handwaved, the foundations skipped over and students then start struggling because they don't understand the foundation of what they are trying to learn. Reading from the textbook is fairly useless and it ends up being used as a problem set source.
I argued to a few math professors about teaching things like calculus with the textbooks referencing concepts that were not actually taught until 5 classes later is a bad idea.
In return I got a shrug of indifference telling me that's just the status quo and the status quo is OK.
Thank god khan academy exists now.
jlelonm|7 years ago
Also, for me personally, it's just such a drag to learn all the notation. After the fact, I've always thought, "Wow, that's all this means?" but while I'm learning, I feel helpless. It doesn't feel like I have any way to google it. My professors never actually want to sit down and explain it to me. All the pages of math equations always look so intimidating. It's just such a drag.
rm_-rf_slash|7 years ago
I do, however, have a talent for language.
The reason I am a good developer is because I can communicate with different machines through different programming languages in the same way I can communicate with different people through different human languages.
I have tried as of late to learn math in an attempt to contextualize it as language - the language of the universe, really - but it is far more of an uphill climb for me than JavaScript or Chinese.
Bizarro|7 years ago
Here's a crazy idea that machine learning might one day help with software engineers understanding algorithms and data structures.
You write some code to traverse a list or something and do some naive sorting, or maybe you're "everyday way of doing some operations on your lists is inefficent". I want some cool machine learning where I can submit my code and it does analysis.
I think Microsoft is working on that. https://techcrunch.com/2018/05/07/microsofts-new-intellicode...
Let's take it a step further. Explain to the programmer why what their doing is wrong.
I would pay big bucks for a "machine intelligence" IDE
g9yuayon|7 years ago
throwawaymath|7 years ago
Linear Algebra is typically the first course in which students have to transition from predominantly rote computation to proof-based theory. Axler's Linear Algebra Done Right is very often the textbook used for that course because it (mostly [1]) lives up to its name. This isn't Math 55: compared to Rudin and Halmos, Axler is a very accessible introduction to linear algebra for those who are ready for linear algebra. The floor for understanding this subject doesn't doesn't get much lower than Axler (and in my opinion, it doesn't get much better at the undergraduate level either).
It's unfortunate that so many people want to skip to math they're not ready for, because there's no shame in building up to it. A lot of frustration can be eliminated by figuring out what you're actually prepared for and starting from there. If that means reviewing high school algebra then so be it; better to review "easy" material than to bounce around a dozen resources for advanced material you're not ready for.
__________________
1. See Noam Elkies' commentary on where it could improve: http://www.math.harvard.edu/~elkies/M55a.10/index.html
tnecniv|7 years ago
thanatropism|7 years ago
The thing is, I couldn't write the damn matrices well lined up and made mistakes when doing calculations. This was really a (de)formative experience. In college Linear Algebra for econ was 40% gaussian elimination, 40% eigenvalues and 20% linear programming. I mean, I still can't do gausian elimination by hand right.
I started crawling out of it when I started seeing (in self-study) a book on linear algebra that takes the linear transform/vector space-first approach.
SpaceManNabs|7 years ago
Learn some:
Calc up to 3 (you can skip some of the divergence and curl stuff)
Linear algebra (no need for Jordan change of basis)
Real analysis
Intermediate probability theory (MAE, MAP, conjugate priors minus the measure theory stuff)
A little bit of differential geometry (at least geodesics. This is for dimension reduction)
Discrete math (know counting and sums really well)
Learn a little bit of Physics (at least know Lagrangians and Hamiltonians)
A little bit of complex analysis (to know contour integration and fourier/laplace transforms)
Some differential equations (up to Frobenius and wave equations)
Some graph theory (my weak spot, but I have used the matrix representations a few times)
After all that, read some Kevin Murphy and Peter Norvig.
Congrats, now you can read most machine learning papers. The above will also give you the toolkit to learn things as they come up like Robbins-Monro.
OP's article is much better if you are trying to be a ML developer/practitioner. Like I said, this list might be too theory focused, but it lets me read lots of applied math papers that aren't ML focused.
blt|7 years ago
imh|7 years ago
tntn|7 years ago
shashanoid|7 years ago
vincentschen|7 years ago
emit_time|7 years ago
skadamat|7 years ago
In our Data Scientist Track (https://www.dataquest.io/path/data-scientist?), I specifically focused on teaching K-nearest neighbors first b/c it has minimal math but you can still teach ML concepts like cross-validation, and then I wrote Linear Algebra and Calculus courses before diving into Linear Regression.
https://www.dropbox.com/s/lh23y44dsg96xpv/Screenshot%202018-...
xenihn|7 years ago
https://minireference.com/
ivansavz|7 years ago
MATH & PHYS book: https://minireference.com/static/excerpts/noBSguide_v5_previ...
LA book: https://minireference.com/static/excerpts/noBSguide2LA_previ... + free tutorial: https://minireference.com/static/tutorials/linear_algebra_in...
ultrasounder|7 years ago
vincentschen|7 years ago
rasmi|7 years ago
vincentschen|7 years ago
Bizarro|7 years ago
Maybe one of these days I'll complete it :)
I really like 3Blue1Brown for a wide range of math topics. He's just a great teacher.
https://www.youtube.com/playlist?list=PLZHQObOWTQDPD3MizzM2x...
Frankly, I find the UTAustin linear algebra class less than ideal or optimal, but it's free and lots of classmates, material, so...
thanatropism|7 years ago
e.g. http://www.aaai.org/ocs/index.php/aaai/aaai11/paper/download...
Then, it's not difficult to understand what a manifold is, but it took me a number of attempts to get it, and then I only did when studying them formally with Spivak 1963. Now the concept of manifold seems patently obvious to me and not really needing much formalization, but...
vincentschen|7 years ago
amorphous|7 years ago
Then things changed at university (studying computer science) and I completely lost interest. Not sure why (bad teacher, going from being best in class to being average, the math at uni different from school).
Now, much later, I regret not having followed through and miss the beauty of Math. I'm re-discovering it and wondering how I could use more of it in my work.
harias|7 years ago
Tenoke|7 years ago
The article seems good overall, but I only skimmed the rest after seeing a citation of a 5-year-old Atlantic article describing disputed and at minimum highly exaggerated findings presented as 'shown in recent studies'.
mkl|7 years ago
Bizarro|7 years ago
ivansavz|7 years ago
This is also really good for connecting LA concepts with visuals http://immersivemath.com/ila/index.html
dsiegel2275|7 years ago
iamaaditya|7 years ago
lordfoom|7 years ago
throwawaymath|7 years ago
If you don’t care about accreditation and are patient, sit down with Axler’s Linear Algebra Done Right and Hoffman & Kunze’s Linear Algebra, in that order.
I would caution you against trying to learn linear algebra using a “take what you need” approach. A random walk approach to learning the material is faster than an accumulation approach, but it’s more brittle and prone to confusion. A lot of things which appear to be irrelevant or unnecessary for machine learning (computation or research) can be imperative for understanding or implementing much more complex concepts later on.
ivansavz|7 years ago
He's an amazing teacher and conveys a lot of intuition + makes even complicated ideas look straightforward.
randcraw|7 years ago
http://codingthematrix.com/
saintPirelli|7 years ago
coherentpony|7 years ago
artwr|7 years ago
I just found a quick explanation by Terence Tao about why people are generally loose in this case, meaning that some properties transition nicely from smooth (here, differentiable) top the rough categories by passing to the limit and density arguments: http://www.math.ucla.edu/~tao/preprints/distribution.pdf
Of course there are exceptions.
vqv|7 years ago
graycat|7 years ago
Foundations Machine Learning (bloomberg.github.io)
at
https://news.ycombinator.com/item?id=17519591
There machine learning (ML) is basically a lot of empirical curve fitting. The context is usually with a lot of data, thousands of variables, millions or billions of data points, observations, pairs of values of thousands of independent variables and the value of the corresponding dependent variable. The work is all a larger, more data, version of: You have a high school style X-Y coordinate system and some points plotted there. So, you want to find values for coefficients a and b so the line
y = ax + b
fits the points as well as possible. But, you can do variations, try to fit, say,
log(y) = a sin(x) + b
Or replace log or sin with any functions you want and try again.
The logic, rational support, is essentially as follows: So, take, say, 1000 x-y pairs. Partition these into 500 training data and 500 test data. Find the best fit you can, using whatever fits, to the training data. Then take the equation and see how well it fits the test data. If the fit of the test data is also good, then that is your model.
Now you want to apply the model in practice, apply the model to data did not see in the given 1000 points. So for the application, will be given a value of x, plug it into the equation, and get the corresponding value of y. That's what you want -- maybe the value of y gives you Y|N for ad targeting, Y|N cancer, what MSFT will be selling for next month, what the revenue will be for next year, etc.
The rational, logical justification here is an assumption (which should have some justification from somewhere) that the x you are given and the y you want for that value of x is sufficiently like the x-y values you had in the original 1000 points.
Okay. Empirical curve fitting to a lot of data to make a predictive model, that is found with training data, tested with test data, and applied where the given data in the application is like the data used in the fitting.
The OP mentions that some people believe that to make progress to real machine intelligence, need more math than what I outlined.
My guess is that to make that intended progress, for all but some tiny niche cases, first need some much more powerful and quite different ideas, techniques, etc. than in the curve fitting ML I outlined.
Yes, there is a chance that with lots of data from working brains and lots of such empirical fitting we will be able to find some fits that will uncover some of the workings of the brain crucial for real intelligence. Uh, that's a definite maybe!
But there is a lot more to what can be done to build predictive models than such curve fitting, empirical or otherwise. I outlined some such in the thread that I referenced above.
So, for the question in the OP, what math? Well, if want to pursue directions other than the empirical curve fitting in the Bloomberg course I referenced above, my experience is -- quite a lot. For the education, start with a good undergraduate major in pure math. So, cover the usual topics, calculus, abstract algebra, linear algebra, differential equations, advanced calculus, probability, statistics. Then continue with more in algebra, analysis, and geometry.