(no title)
fzysingularity | 3 months ago
The surprising part: models that ace benchmarks often fail on seemingly trivial visual tasks, while others succeed in unexpected places. We show concrete examples, side-by-side outputs, and how each model breaks when chaining multiple visual steps.
We go into more details in our technical whitepaper [3]. Play around with Orion for free here [4].
[1] Showdown: https://chat.vlm.run/showdown
[2] Learn about Orion: https://vlm.run/orion
[3] Technical whitepaper: https://vlm.run/orion/whitepaper
[4] Chat with Orion: https://chat.vlm.run/
Happy to answer questions or dig into specific cases in the comments.
unknown|3 months ago
[deleted]