top | item 30877769

(no title)

It's very clear if you read the article that what the author is calling "feature selection" might be better termed "feature generation". He explicitly calls out what he means in the post:

> When used for feature selection, data scientists typically regard z^p:=(z_1,…,z_p) as a feature vector than contains fewer and richer representations than the original input x for predicting a target y.

I don't even think this is necessarily incorrect terminology, especially given the author's background of working primarily for Google and the like. It's the difference between considering feature section as "choosing from a list of the provided features" vs "choosing from the set of all possible features". The author's term makes perfect sense given the latter.

PCA is used for this all the time in the field. There have been an astounding number of presentations I've seen where people start with PCA/SVD as the first round of feature transformation. I always ask "why are you doing that?" and the answer is always mumbling with shoulder shrugging.

This is a solid post and I find it odd that you try to dismiss it as either ignorant or click bait, when a quick skim of it dismisses both of these options.

discuss

No comments yet.