top | item 35887879

MultiModal-GPT: A Vision and Language Model for Dialogue with Humans

4 points| vov_or | 2 years ago |github.com

1 comment

order

vov_or|2 years ago

Guys trained a multi-modal chatbot with visual and language instructions based on the open-source multi-modal model OpenFlamingo!

Paper link: https://arxiv.org/abs/2305.04790