> I'm not convinced that it possibly could have. It takes time for papers to get published and the LLM world is moving rather quickly.
The paper was submitted to arXiv on 13th February, and we're here reading it, less than a week later.
But we don't have to assume. The list of models is right there in the paper, on page 5:
We select seven frontier models: GPT-5.2 (OpenAI), Claude Opus 4.5, Claude Opus 4.6, Claude Sonnet 4.5, Claude Haiku 4.5 Anthropic), Gemini 3 Pro, and Gemini 3 Flash (Google). All models use temperature 0 for deterministic sampling.
zahlman|12 days ago
rahimnathwani|12 days ago
Yes it did.
> I'm not convinced that it possibly could have. It takes time for papers to get published and the LLM world is moving rather quickly.
The paper was submitted to arXiv on 13th February, and we're here reading it, less than a week later.
But we don't have to assume. The list of models is right there in the paper, on page 5: