top | item 45122152

(no title)

eab- | 5 months ago

What do you mean about CLIP?

discuss

I believe he is referring to OpenAI proposal to move beyond training with pure text. Instead train with multi modal data. Instead of only the dictionary definition of an apple. Train it with a picture of an apple. Train it with a video of someone eating an apple etc.

godshatter|5 months ago

Before this AI wave got going, I'd always assumed that AGI would be more about converting words, pictures, video, and lots of sensory data and who knows what else into a model of concepts that it would be putting together and hypothesizing about and testing as it grows. A database of what concepts have been learned and what data they were built from and what holes it needed to fill in. It would continually be working on this and reaching out to test reality or discuss it's findings with people or other AIs instead of waiting for input like a chatbot. I haven't even seen anything like this yet, just ways of faking it by getting better at stringing words together or mashing pixels together based on text tokens.

No one seems to be working on building an AI model that understands, to any real degree, what it's saying or what it's creating. Without this, I don't see how they can even get to AGI.