While vision-vision models are certainly cool, I don’t think that they are as economically valuable as vision-speech or text-text. Humans don’t have vision output.
Computation may be increasing, but that is a statement about the short-term not the long-term.
If we want to predict the future then we care about: how many capabilities can you fit on a phone-sized computer? And I believe that the answer is: a lot.
zarzavat|1 year ago
Computation may be increasing, but that is a statement about the short-term not the long-term.
If we want to predict the future then we care about: how many capabilities can you fit on a phone-sized computer? And I believe that the answer is: a lot.