> would you happen to have the paper proving that somewhere?
Actually no I don't, but here's the intuition. Consider what happens in the limit when the bandwidth goes to zero: the kernel collapses to a delta function, i.e. K(x_i, x_j)=1 when i=j and 0 otherwise. The kernel matrix approaches the identity. The optimal coefficients solving the quadratic program approach zero. The SVM predicts zero almost everywhere except in a smaller and smaller surrounding of the training points, where the prediction equals the label of that point.
RBF kernel represents vectors of infinite dimension and that makes the dataset linearly separable. Increasing dimensions improves separability and going to infinity makes it always true. Representation is not infinite but is implicitly infinite.
For example when you use polynomial kernels you do not add extra dimensions to input vectors explicitly but you could use a linear kernel and add quadratic terms to the input vectors and still get the same separability.
blackbear_|5 years ago
Actually no I don't, but here's the intuition. Consider what happens in the limit when the bandwidth goes to zero: the kernel collapses to a delta function, i.e. K(x_i, x_j)=1 when i=j and 0 otherwise. The kernel matrix approaches the identity. The optimal coefficients solving the quadratic program approach zero. The SVM predicts zero almost everywhere except in a smaller and smaller surrounding of the training points, where the prediction equals the label of that point.
labelbias|5 years ago
For example when you use polynomial kernels you do not add extra dimensions to input vectors explicitly but you could use a linear kernel and add quadratic terms to the input vectors and still get the same separability.