Mathematical Notation Is Awful

[+] chilie|9 years ago|reply

These are just issues that someone unfamiliar with a field would face. None of them are problems for those of us in the field.

First, the expectation thing. He's using a special case, E(X), and complaining that the more general case doesn't follow the general case. It's like saying "Well the plural of mouse is mice but the plural of house isn't hice!". The general definition of expectation (for a discrete probability) is

E(f(x)) = sum f(x_i)* p(x_i)

If you start with this general definition, both E(X) and E(X^2) are perfectly natural. The author's error of starting with the special case in no way implies an issue with notation.

And how is the fact that Wikipedia is inconsistent between E(X) and E[X] in any way mathematical notation's fault? If you read a novel that starts using ' for quotes and switches to ", that's an issue with the novel (assuming its not stylistic) and not an issue with the typography in general.

[+] bjornsing|9 years ago|reply

> These are just issues that someone unfamiliar with a field would face. None of them are problems for those of us in the field.

True, but it raises a "barrier to entry" (on purpose, or by mistake) because it is almost impossible to enter the field without a supervisor/colleges that provide the "semi-supervision" needed to learn the notation.

I can understand how those in the field think that's a good thing, but for the rest of humanity it probably isn't... Look e.g. what has happened in academic operating system research: innovation has moved from Berkeley and Bell Labs to the Linux kernel mailing list. Are academic OS researchers better off because of it? Probably not. Is the world better off? You bet!

[+] julian37|9 years ago|reply

> It's like saying "Well the plural of mouse is mice but the plural of house isn't hice!"

Yes, that's exactly what he is saying:

> Math is a language that is about as consistent as English, and that's on a good day.

[+] posterboy|9 years ago|reply

Unfamiliar is a weasel word. It's a No-True-Scottsman. And I don't mean, that it is entirely wrong, I am saying you are unjustly putting a limit to whom you deem worthy for the field. There is no need for imprecision other than speed breaking things while you go.

> "Well the plural of mouse is mice but the plural of house isn't hice!"

Are irregular word forms necessary or essential? Probably, but I doubt you could explain why. Hence you are not qualified to ridicule anyone. It's a perfectly valid complaint, IMHO, but probably only loosely related to the math example, which I can't be bothered to follow at the moment.

[+] iopq|9 years ago|reply

I have taken ten years of math in school and then calculus, discrete math, linear algebra. That's all useless if I want to follow math in a simple research paper because they're using notation common in THAT field and it has nothing to do with the notation in another math field.

And it's all V hat superscript pi subscript h. It's not like code, where I get descriptive variable names. And you thought pi meant pi? No, it means policy in THIS context.

[+] kstenerud|9 years ago|reply

This sounds so much like something the greybeards used to say in the 80s: "If it was hard to write, it should be hard to understand"

Basically, it's not the fault of our systems; it's the user's fault. Once he learns the arcane incantations, he'll understand why our way is the better way.

Computer UX has finally progressed beyond this arrogance. Why not math?

[+] paxcoder|9 years ago|reply

So why are years of studying math insufficient for one to be "familiar" with it?

[+] pessimizer|9 years ago|reply

> "Well the plural of mouse is mice but the plural of house isn't hice!"

That's the strangest defense. English notation is awful.

[+] nawitus|9 years ago|reply

"These are just issues that someone unfamiliar with a field would face. None of them are problems for those of us in the field."

How is this argument different from defending code with bad variable naming by stating that it doesn't cause issues to anyone familiar with the code base?

[+] lifthrasiir|9 years ago|reply

In the other words, E(...) has a hidden lambda there, and the fully consistent usage would be E(X -> X^2) (instead of E(X^2)) and so on. The covariance thing would be E((X,Y) -> X*Y) with an implied domain being a Cartesian product of that of X and Y. Of course we humans can easily infer the domains, and writing explicit domains every time is not efficient.

[+] epicaricacy|9 years ago|reply

>It's like saying "Well the plural of mouse is mice but the plural of house isn't hice!".

Just because English sucks and doesn't make any sense doesn't help the case for mathematical notation.

[+] jamessb|9 years ago|reply

I think the fundamental source of confusion might be due to not understanding the concept of a random variable. In his example, X is a random variable; the expectation E[X] is an functional applied to its probability mass function. Given this, it should not be surprising if it seems to behave differently to "any other function, in math".

If you understand this, I think the notation is natural:

We have a random variable X, which takes value x_i with probability p(x_i). Thus, the random variable X^2 will take value (x_i)^2 with probability p(x_i).

Given that the expection for the random variable X with p.m.f p(x_i) is defined as E[X] = \sum x_i p(x_i), it should be clear that to obtain the expectation of any random variable we must sum over the product of (value) and (probability of that value). It should also be clear that this gives E[X^2] = \sum (x_i)^2 p(x_i)

I'm confused by his comment that: p(xi) isn't, because it doesn't make any sense in the first place. It should really be just PXi or something, because it's a discrete value, not a function!

The probability mass function is a function: for a given value, it gives the probability that the the discrete random variable takes that value. To calculate the expectation we use the values obtained by evaluating the function at discrete points, but what else could we do?

[+] adrianratnapala|9 years ago|reply

You are right, and the deeper source of the authers beef seems to be that he wants mathematics to be about textual rules. Now that's a fine thing in a computer language -- and there are philosophers who say mathematical truth is no more than computation. But mathematical notation is for humans to talk about things that have meanings that humans can understand.

The fact is that E[X^2] is natural way of an important concept. Whereas "\sum \sum (x_i)^2 p(x_i^2)", need mean nothing at all (especially is $p$ is not defined over all the $x_i^2$).

[+] markhkim|9 years ago|reply

If you think probability theory is bad...

The old joke that "differential geometry is the study of properties that are invariant under change of notation" is funny primarily because it is alarmingly close to the truth. Every geometer has his or her favorite system of notation, and while the systems are all in some sense formally isomorphic, the transformations required to get from one to another are often not at all obvious to the student. —John M. Lee, "Introduction to Smooth Manifolds"

[+] gnuvince|9 years ago|reply

I started doing a lot better in calculus when I started using longer notation (e.g. f = x -> x³ instead of f(x) = x³) and making sure that things "type checked". For instance, this tendency to use f(x) to refer to a function, rather than just f, was very confusing to me, because f(x) is an element of the co-domain while f is a function (typically from real to real in my undergrad classes). I had to figure this out by myself because the textbook I was using and the prof all went with the notation that wouldn't type check. When I finally realized that dy/dx should instead be (d/dx)(f), things started being a lot clearer to me: derivation takes a function and returns a function and f is a function so everything checks out.

[+] MrManatee|9 years ago|reply

It's good to think of dy/dx as (d/dx)y. In addition, it is also possible to make some sense of dy/dx. Here's one very hand-wavy way of looking at it.

Let ε be something very small, and define the difference operator d so that (df)(x) = f(x + ε) - f(x). Usually we don't want to handle the functions dx and dy by themselves, because they are so small, and their exact values depend on ε. But when we divide dy by dx we get something that is no longer ε-sized, and doesn't (in a limit sense) depend on the value of ε.

And why think way? When I learned the chain rule dy/dx = dy/du * du/dx I was told that even though the du's appear to cancel out, this is just abuse of notation and basically a meaningless coincidence. I understand that the teachers just wanted students to be careful; they don't want people "simplifying" dx/dy to x/y. However, I was never really satisfied with this explanation. I finally realized that by thinking about it using the difference operator above, it is not a meaningless coincidence: the du's actually do, in a sense, cancel out.

[+] mrottenkolber|9 years ago|reply

This a thousand times. I boycott math notation. My most memorable math moments are:

- giving a talk on notation in high school (I was not tasked to do this, I decided to do it on my own) because I was FREAKED OUT by how we were using tons of symbols nobody had ever explained or defined

- converting all Math I encountered in University to Common Lisp programs to get rid of the shit notation

- bursting into crazy laughter when after five algebra lectures the prof notices that the students parse his notation differently than he does

Give it names, use S-Expressions.

[+] nabla9|9 years ago|reply

I had the similar experience when I started to learn math, even the Common Lisp part. I knew how to program from young age and I thought math syntactically. From that mindset it was obviously painful.

But once you really start to understand math, you realize that mathematical notation is not very formal or rigorous. It's shorthand visual help to keep track actual mathematical objects behind the notation. S-expressions are perfect for formal definitions and programming. They are not so good for actually thinking mathematics. Mathematical notation can be dense and contain lots of information.

Notation is problematic for students because underlying concepts or notation is almost never explained well. For example, I don't remember anyone explaining what functional or implicit functions are before using them heavily. I had to figure them out myself.

[+] Smaug123|9 years ago|reply

Out of interest, what subject did you study at University? I studied maths, and everything was defined rigorously at the start of every pure course; and you could always stop the lecturer and ask them what a given piece of notation meant.

[+] catnaroek|9 years ago|reply

Mathematical notation was designed to calculate by hand. If you actually tried to calculate anything by hand, you'd dread long variable names and not being able to use infix symbols with sensible precedence rules. But, of course, a programmer would rather die than calculate anything by hand.

[+] agumonkey|9 years ago|reply

Out of curiosity, are you familiar with SICM ? https://mitpress.mit.edu/sites/default/files/titles/content/...

[+] yorwba|9 years ago|reply

I had an exam in probability theory yesterday, so these topics are still quite fresh in my mind. The confusion already starts when he uses the equation

  E[X] = \sum_{i=1}^{\infty} x_i p(x_i)

for the expectation. In my class it was introduced as

  E[X] = \sum_{\omega \in \Omega} X(\omega) P(\omega)
       = \sum_{ x \in X(\Omega)} x p_X(x)

which totally makes sense if you know that X is a function that assigns a value to each possible outcome. In most cases we don't actually care about the outcomes, so there is the second description using p_X.

The subscript X is important to highlight that p is not just some arbitrary function, it is p_X, the probability mass function of X.

Now when you want to compute the expectation of X^2, you use

  E[X^2] = \sum_{\omega \in \Omega} X^2(\omega) P(\omega)
         = \sum_{x^2 \in X^2(\Omega)} x^2 p_{X^2}(x^2)

i.e. the substitution he wanted to do actually works when you make the dependency of p_X on X explicit.

Now p_{f(X)} is not that easy to compute from p_X in general, because you have to account for multiple possible ways to reach the same value, e.g. x^2 = (-x)^2. For f(x) = x^2 we have

  p_{X^2}(x^2) = p_X(x)             for x = 0
    and        = p_X(x) + p_X(-x)   otherwise

If f is more complex, there is a third way using

  E[f(X)] = \sum_{x \in X(\Omega)} f(x) p_X(x)

which amounts to the same thing, but is usually easier to calculate.

[+] bjornsing|9 years ago|reply

One interesting observation that popped into my mind when I read the OP: Mathematicians don't write mathematical notation in papers/books by hand anymore, they use a far more verbose language called LATEX.

Wouldn't it be great if every time you saw a mathematical formula there was a little widget to push that would show you the "source code" in LATEX++, and LATEX++ was like LATEX but made up of stringently defined mathematical operations (like '\element_wise_multiplication' instead of '\plus_sign_with_circle_around_it')? :D

[+] Smaug123|9 years ago|reply

It's good practice anyway in LaTeX to define your own aliases and use them for semantic information. Like in one of my recent projects, I define:

    \newcommand{\disjointunion}{\sqcup}

[+] stdbrouw|9 years ago|reply

Yeah, it's a bummer that LaTeX math notation is really just markup and has no indication of what belongs together and what the purpose of various symbols is.

[+] catnaroek|9 years ago|reply

LaTeX... now that is unreadable. Traditional mathematical notation is just fine.

[+] bsaul|9 years ago|reply

I remember being so outraged when i was first introduced to derivative in high school... Seeing that they didn't use the same notation nor exact same definition in my physics class and maths class, in the same year, that was making me absolutely furious...

[+] billconan|9 years ago|reply

I have similar feelings with music notation too. we never applied our "user experience" standards to mathematical notation and music notation. If these things were invented today, we might come up with better ideas.

One painful experience I have reading math is telling which variables are "vectors" and which are "scalars". another is the similar looks of certain greek characters and english characters, such as alpha and a.

[+] GnarfGnarf|9 years ago|reply

I was just going to mention music. I learned some piano as an adult. I was never able to accept that the same note positions on the treble staff had a different value on the bass staff. Yes, I understand why it's so, it's just terribly user-unfriendly.

Also, in keys other than C, I felt that it wouldn't kill them to mark every instance of the sharp and flat notes, instead of doing it once at the beginning and implying it everywhere else.

[+] johnhenry|9 years ago|reply

[+] lisper|9 years ago|reply

And the associated HN discussion:

https://news.ycombinator.com/item?id=12137673

[+] zimbatm|9 years ago|reply

If I could fix one things in Maths, it would be to introduce an explicit import statement. Right now it's very hard to work back what the symbols mean in a specific context unless you're familiar with the field.

   from url/to/geometric-algebra.pdf import X;

I don't mind the overloading too much and it would always be possible to alias symbols in case two or more fields are user together.

[+] ivansavz|9 years ago|reply

Very cool idea!

I think this would work both at the logical level (e.g., the concept of azimutal angle in a xyz-coordinate system) and as a stylesheet level (theta for physcisits, and phi for mathematicians).

Such annotations would really help with browsing math content—you can see what prerequisites concepts are used in a paper from its import statements, without the need to read the whole thing. Also, you could browse from the other end (reverse-imports / uses), looking for docs that make use of a given concept.

[+] gerbilly|9 years ago|reply

Math notation is a language that evolved over centuries.

The English language isn't consistent but we all seem to be able to use it to communicate here.

Same thing with maths. Some notations are holdovers from earlier eras, but we still introduce them to students in case the run into it in an older book (dx/dy for example).

And maths isn't just about computation. It's also about expressing ideas, and sometimes that is easier when the notation isn't rigidly 'executable' as some posters here would have it.

This article also reminds me of: http://knowyourmeme.com/photos/582861-reaction-images

[+] paulpauper|9 years ago|reply

Leibniz's notation is useful for the chain rule and change of variables for parametric based equations

[+] billbail|9 years ago|reply

Leibniz notation is very interesting as it is not technically correct but it is still is used in lots of maths today. Recommended: http://math.stackexchange.com/questions/21199/is-frac-textrm...

[+] hgibbs|9 years ago|reply

The post is a pretty childish rant. One of the great facets of mathematical writing (and not learning, unfortunately) is that you can explicitly define your own notation, and then use said notation whenever you want. The author even notes this in his final paragraph, but doesn't seem to see it as an advantage of mathematical writing.

Prior to modern notation, mathematics was written out in english in full. What we have now is significantly better than what existed before modern mathematical shorthand.

Also, the expectation is an operator and not a 'function' (in the sense that it does not take values in one of the canonical scalar fields e.g. R, C). The notation makes perfect sense in this setting. For example, the expression E[x^2] should be interpreted as E acting on the function x -> x^2 and not on a number x.

[+] nshm|9 years ago|reply

As a developer I'm not quite happy with a single-letter names. Of course it's ok for minor local things to be named 'x' but for more important values and functions there is no problem these days to have a readable names just like in software. So you could actually read of a paper without guessing and searching those epsilons, lambdas and cappas and cryptic symbols. Use expectation(x) instead of e(x), use mean(y) instead of \hat y and so on.

[+] stdbrouw|9 years ago|reply

It really depends. Within a given field certain concepts are so common that you really do want a way to express them in the most terse possible way, just as programmers use i for index, err for error and n for counts. This allows you to put more stuff in a smaller space which means you don't have to jump back and forth to understand something -- sort of like how it's nice to put different parts of an app into different files and even to split them up into separate libraries, but go too far and the flow of the application becomes opaque.

Have you ever had a chance to read some of those really old mathematical proofs that didn't use any mathematical symbols at all? They're a nightmare to try and understand. There's a tradeoff. No disagreement, though, that mathematicians tend not to be very great at finding the sweet spot where the trade-off balances :-)

(Also, the sample mean of y would be `\bar y`, hats are for estimators in general.)

[+] erdewit|9 years ago|reply

In my experience people that are very verbose, that talk or write at great length and with ease, tend to have a dislike for terse formulations. Sometimes to the point of being offended by it. Math is just about the ultimate in terse notation and perhaps the author is one of these verbose people.

The same thing can be seen with programmers: The verbose programmer writes a lot of lines of code and is proud of it, while the terse programmer is proud to remove or simplify code.

[+] htns|9 years ago|reply

This is just really weakly argued. Firstly, the notation does use capital X and and lowercase x. You need to realize these are different. It's not substitution to go from X^2 to x^2. Capital letters are not real numbers, and thus it's a type error too. Secondly, if you are familiar with differentiation and integration, you are familiar with straight substitution not always being correct.

[+] amelius|9 years ago|reply

One of the cool things of Mathematica is that it solves these problems. But the result is, unfortunately, somewhat more verbose.

[+] k__|9 years ago|reply

their naming isn't that good either.

All these words that are already used for other things.

Magma, ring, group, body, lense, optic...

[+] efz1005|9 years ago|reply

You should learn what the concepts mean instead of judging the names associated to them. We call something a Ring to avoid saying "a set with two operations which behave in a certain way [...]" every time we refer to it.

[+] jokoon|9 years ago|reply

Funny, I said the same about Andrew Ng's ML course.

Ultimately, if what you're teaching is going to end up in software, why use math at all? Use code or pseudo code. I don't think it's bad to just give the working algorithm without having to prove the math.

Really how many students will end up being computer scientists anyway, and research and write about new methods of doing AI, and do the actual math? So few. I guess that's a simple criticism of academics.

It's just easier to work with code than mathematical notation most of the time, in my view. You can't replace math, of course, but when things are simple enough, it could be avoided. It's a matter of making math accessible to the most people.

Code is amazing because a computer can check to see if it works. A computer doesn't understand math.

[+] nkozyra|9 years ago|reply

Understanding underlying math is what allows people to create improved algorithms. If it's just "implement scikit" then you barely even need a developer.

Understanding why things work is still important and why software companies still routinely test on algorithm design.

94 comments