top | item 15196014

(no title)

gyom | 8 years ago

It's cool to see that much dedication. It's useful when people take the time to summarize knowledge in a book to serve as reference.

But ... I have the feeling that the author, who is relatively new to the field (by his own admission), expanded a lot of formulas and made certain parts of the theory more complicated than it should be.

Look around page 60. There are formulas with 6 summation signs in front of them, with all kinds of little indices floating around. How about page 37 ?

In a way, the whole point about the chain rule (and software libraries that implement it) is that you can stay in "math world" to do the reasoning, and not think about the job of managing the computation.

Same idea with expression as much as possible in terms linear algebra primitives. Matrix multiplication is easier to understand when it's not broken apart into sums whose indices you have to track.

discuss

ssivark|8 years ago

See author's note excerpted below; that was an explicit goal of the project.

> This work has no benefit nor added value to the deep learning topic on its own. It is just the reformulation of ideas of brighter researchers to fit a peculiar mindset: the one of preferring formulas with ten indices but where one knows precisely what one is manipulating rather than (in my opinion sometimes opaque) matrix formulations where the dimension of the objects are rarely if ever specified.

-- I think that having those things written out explicitly is of great help to those not fully comfortable with formal manipulations. It is particularly useful when implementing those operations in low-level code. I say this even though I personally find the Einstein notation [1] most convenient.

[1]: https://en.wikipedia.org/wiki/Einstein_notation

gyom|8 years ago

I had not caught that note from the author. Thanks for pointing it out.