top | item 23376418

(no title)

ethelward | 5 years ago

> The RBF kernel nakes any dataset linearly separable, as long as the bandwidth is small enough

That's very interesting, would you happen to have the paper proving that somewhere?

discuss

> would you happen to have the paper proving that somewhere?

Actually no I don't, but here's the intuition. Consider what happens in the limit when the bandwidth goes to zero: the kernel collapses to a delta function, i.e. K(x_i, x_j)=1 when i=j and 0 otherwise. The kernel matrix approaches the identity. The optimal coefficients solving the quadratic program approach zero. The SVM predicts zero almost everywhere except in a smaller and smaller surrounding of the training points, where the prediction equals the label of that point.

labelbias|5 years ago

RBF kernel represents vectors of infinite dimension and that makes the dataset linearly separable. Increasing dimensions improves separability and going to infinity makes it always true. Representation is not infinite but is implicitly infinite.

For example when you use polynomial kernels you do not add extra dimensions to input vectors explicitly but you could use a linear kernel and add quadratic terms to the input vectors and still get the same separability.