(no title)
ctur | 1 year ago
Another thing about the architecture is we inherently bias it with the way we structure the data. For instance, take a dataset of (car) traffic patterns. If you only track the date as a feature, you miss that some events follow not just the day-of-year pattern but also holiday patterns. You could learn this with deep learning with enough data, but if we bake it into the dataset, you can build a model on it _much_ simpler and faster.
So, architecture matters. Data/feature representation matters.
mr_toad|1 year ago
I think you need a hidden layer. I’ve never seen a universal approximation theorem for a single layer network.
dongecko|1 year ago
ted_dunning|1 year ago
Multi-layer substantially changes the scaling.