top | item 46923132

(no title)

stuxnet79 | 22 days ago

Regarding architecture, I don't believe a satisfying "why" is in the cards.

Conceptually neural networks are quite simple. You can think of each neural net as a daisy chain of functions that can be efficiently tuned to fulfill some objective via backpropagation.

Their effectiveness (in the dimensions we care about) are more a consequence of the explosion of compute and data that occured in the 2010s.

In my view, every hyped architecture was what yielded the best accuracy given the compute resources available at the time. It's not a given that these architectures are the most optimal and we certainly don't always fully understand why they work. Most of the innovations in this space over the past 15 years have come from private companies that have lacked a strong research focus but are resource rich (endless compute and data capacity).

discuss

No comments yet.