top | item 38047337

(no title)

yagyu | 2 years ago

I’d be interested in your thoughts on the case where the f_i are optimizable: f_i(t) = K(t, z_i), i=1..m << N. Like the representer thm but much fewer terms than you have data points to fit. The points z are usually called inducing points and may be optimized by gradient descent.

There is literature on approximating exact GP inference with (something like) these objects when m << N (variational inference).

However, I’m not aware of anyone drawing a clear picture of the other direction, starting from the optimization picture and explaining it in terms of inference, similar to what TFA does.

In TFA the number of functions is large, so the system is underdetermined. In the variational inference the system is overdetermined and I wonder what inference, if any, gradient descent does..

Caveat: 1am and a few drinks deep so if I’m not making sense that’s ok

discuss

No comments yet.