(no title)
robbrown451 | 2 years ago
This is true. But human brains don't directly model the world either, they form an internal model based on what comes in through their senses. Humans have the advantage of being more "multi-modal," but that doesn't mean that they get more information or better information.
Much of my "modeling of the world" comes from the fact that I've read a lot of text. But of course I haven't read even a tiny fraction of what GPT4 has.
That said, LLMs can already train on images, as GPT4-V does. And the image generators as well do this, it's just a matter of time before the two are fully integrated. Later we'll see a lot more training on video and sound, and it all being integrated into a single model.
No comments yet.