(no title)
Jack000 | 1 year ago
I'm not certain about the specific models tested, but some VLMs just embed the image modality into a single vector, making these tasks literally impossible to solve.
Jack000 | 1 year ago
I'm not certain about the specific models tested, but some VLMs just embed the image modality into a single vector, making these tasks literally impossible to solve.
No comments yet.