top | item 43376947

(no title)

numba888 | 11 months ago

Great, but how do you imagine multimodal with text, video. Just 2 for simplicity, what will be in the training set. With text model tries to predict next, then more steps were added. But what to do with multimodal?

discuss

No comments yet.