top | item 37610468

(no title)

It's not a likely solution given how loss functions work, but in theory a single model could learn to perform exactly the function you describe. When you say "just do X" where X is any function (in this case, a piecewise function), a large enough model could do it.

discuss

sandpaper26|2 years ago

After some reflection, it's maybe more accurate to visualize this in reverse: all expert models see the problem and attempt a solution, and then some "manager" model decides which expert model has the best solution and outputs it.

liquidpele|2 years ago

Until the manager model decides to outsource the expert models.

treprinum|2 years ago

In theory you need two layers to model any function. In practice this is wildly different.

sandpaper26|2 years ago

Any memoryless continuous function between two Euclidean spaces, I think you mean. The experts-and-manager model would need to be able to do more than that (as do most neural networks).

And part of the reason why single-hidden-layer networks aren't enough even in continuous memoryless Euclidean cases is, again, because of how loss functions work; you're unlikely to converge on a good approximation with very few hidden layers.