top | item 43111900

(no title)

yurimo | 1 year ago

Multimodal agents notoriously fail at long horizon tasks, how does Magma perform on it?

discuss

jwyang|1 year ago

very good question, now we are mainly focusing on building the foundtion for multimodal perception and atomic action taking. Of course, integrating the trace-of-mark prediction for robotics and human video data enhances the model's medium length reasoning but this is not sufficient for sure. The current Magma model will serve as the basis for our next step, i.e., longer horizong reasoning and planning! We are exactly looking at this part for our next version of Magma!