In our previous tests, when it was 1.5 Pro against GPT 4o and Claude Sonnet 3.7, Gemini wasn't winning in the multilingual race, but it was definitely competitive. 2.5 and 3.0 seems to be big leaps from the 1.5 days.
That said, it also depends on the testing methodology; we tested a bunch of use cases mostly to test core linguistic proficiency. Not as much complex tasks in language or cultural knowledge.
deaux|2 months ago