SoerenL's comments

SoerenL | 7 years ago | on: Computing Higher Order Derivatives of Matrix and Tensor Expressions [pdf]

Yeah, I am referring to Ricci notation. TF and PyTorch don't use it. The new version already has relus. This version is targeted at standard formulas (has abs as a non-differentiable function), next version works for deep learning. Never wanted to work on this but there is just no way around it.

SoerenL | 7 years ago | on: Computing Higher Order Derivatives of Matrix and Tensor Expressions [pdf]

True, for large-scale problems Hessian-vector products are often the way to go (or completely ignoring second order information). However, computing first an expression for the Hessian symbolically and then taking the product with a vector is still more efficient than using autodiff for Hessian-vector products. It is not in the paper though. But the gain is rather small (just a factor of two or so). But true, only for problems involving up to a few thousand parameters computing Hessians or Jacobians is useful.

SoerenL | 7 years ago | on: Computing Higher Order Derivatives of Matrix and Tensor Expressions [pdf]

Not in its current formulation. It uses a different representation of the tensors. However, a new version/algorithm that will be available in a few months can be used in TF and PyTorch.

SoerenL | 7 years ago | on: Computing Higher Order Derivatives of Matrix and Tensor Expressions [pdf]

That's what is usually done, autodiff on the component wise expression. We don't do it here. Instead, we really compute on the matrix and tensor level and compute derivatives here directly. Let me give you a simple example to illustrate it: Consider the function f(x)=x'Ax. Then, its Hessian is A+A'. This expression is what we compute. And evaluating this expression is orders of magnitude faster than what TF, PyTorch, etc. do, namely autodiff on the components. Hope this helps, if not, please let me know.

SoerenL | 7 years ago | on: Computing Higher Order Derivatives of Matrix and Tensor Expressions [pdf]

Do you need Hessians or Jacobians for computing the earth mover distance, then a definite yes. Otherwise, I would doubt it (though I do not know exactly.)

SoerenL | 7 years ago | on: Computing Higher Order Derivatives of Matrix and Tensor Expressions [pdf]

As one of the authors: Yes, it is true, 3 orders of magnitude (so about a factor of 1000 on GPUs). But please be careful, this holds only for higher order derivatives (like Hessians, or Jacobians), as stated in the paper and as the title says. For gradients, the speedup is rather limited.

SoerenL | 8 years ago | on: Matrix Calculus for Deep Learning

But how do you compute the derivative of x'Ax in Mathematica (x being a vector and A being a matrix)? What you have pointed out is only scalar derivatives, if I am not mistaken here.

SoerenL | 8 years ago | on: Matrix Calculus

XLA is good for the GPU only. On the CPU MC is about 20-50% faster than TF on scalar valued functions. For the GPU I don't know yet. But it is true that for augmented Lagrangian you only need scalar valued functions. This is really efficient on large-scale problems. But on small-scale problems (up to 5000 variables) you really need interior point methods that solve the KKT conditions directly as you point out. This is sometimes really needed. However, when you look at the algorithms TF and MC do not differ too much. In the end, there is a restricted number of ways of computing derivatives. And basically, most of them are the same (or boil down to two versions). Some of the claims/problems made in the early autodiff literature concerning symbolic diff is just not true. In the end, they are fairly similar. But lets see how XLA performs.

SoerenL | 8 years ago | on: Matrix Calculus

I did not compare to Tensorflow XLA but I compared it to Tensorflow. Of course, it depends on the problem. For instance, for evaluating the Hessian of x'Ax MC is a factor of 100 faster than TF. But MC and TF have different objectives. TF more on scalar valued functions as needed for deep learning, MC for the general case, especially also vector and matrix valued functions as needed for dealing with constraints. But I will give TF XLA a try.

SoerenL | 8 years ago | on: Matrix Calculus

As one of the authors of this tool I understand that you would like to have some way of trusting the output.

Internally we check it numerically, i.e., generate some random data for the given variables and check the derivative by comparing it to an approximation via finite differences. We will ship this code (hopefully soon) with one of the next versions. You can then check it yourself. Otherwise, as far as I know there does not exist any other tool that computes matrix derivatives so I understand it is hard to convince anyone of the correctness of the results. But I hope the numerical tests will be helpful.

SoerenL | 8 years ago | on: Matrix Calculus

It is not in the current online tool but we will add it again soon. It is still in there the way you describe it (passing transpose down to the leaves and simplification rules as well). Btw: How are you doing and where have you been? Would be nice to also add a link to you and your current site. GENO is also on its way.