What? Fundamentally, information can only be so dense. Current models may be inefficient w.r.t. information density, however, there is a lower bound of compute required. As a pathological example, we shouldn't expect a megabyte worth of parameters to be able to encode the entirety of Wikipedia.
No comments yet.