antixk | 1 year ago | on: Start presentations on the second slide
antixk's comments
antixk | 3 years ago | on: PyTorch 2.0
[0] https://pytorch.org/tutorials/beginner/deeplabv3_on_ios.html
antixk | 3 years ago | on: A Master Perfumer's Reflections on Patchouli and Vetiver
antixk | 5 years ago | on: Deep learning model compression methods
antixk | 5 years ago | on: Microsoft Coffee
antixk | 5 years ago | on: Matrix multiplication inches closer to mythic goal
[0] https://nla-group.org/2020/07/21/numerical-behaviour-of-tens...
antixk | 5 years ago | on: An Elementary Introduction to Information Geometry [pdf]
antixk | 5 years ago | on: An Elementary Introduction to Information Geometry [pdf]
Now what's all this gotta do with Information? Usually, information is represented in terms of statistical distributions, from Shannon's information theory. What the early founders of IG observed is that, these statistical distributions can be represented as points on some curved space called a Statistical Manifold. Now, all the terms used in information theory can be reinterpreted in terms of geometry.
So, why is it so exciting? Well in Deep Learning people predominantly work statistical distributions, some even without realising it. All our optimizations involve reducing distance between some statistical distributions like the distribution of of the data and the distribution that the neural network is trying to model. Turns out, such optimization when done in the space of statistical manifold, amounts to the gradient descent that we all know and love. All the gradient based optimisations are only approximations to the local geometry like gradient(local slope) , Hessian(local quadratic approximation of curvature), but optimisation in the statistical manifold can yield the exact curvature and thus are more efficient. This method is called Natural Gradient.
Hope this helps.
antixk | 6 years ago | on: Show HN: Squirrel Curve Studio – A simple tool to design spline curves