top | item 47041767

(no title)

Yes, but this paper studied recent models.

discuss

zahlman|12 days ago

Did it? I'm not convinced that it possibly could have. It takes time for papers to get published and the LLM world is moving rather quickly.

rahimnathwani|12 days ago

> Did it?

Yes it did.

> I'm not convinced that it possibly could have. It takes time for papers to get published and the LLM world is moving rather quickly.

The paper was submitted to arXiv on 13th February, and we're here reading it, less than a week later.

But we don't have to assume. The list of models is right there in the paper, on page 5:

  We select seven frontier models: GPT-5.2 (OpenAI), Claude Opus 4.5, Claude Opus 4.6, Claude Sonnet 4.5, Claude Haiku 4.5 Anthropic), Gemini 3 Pro, and Gemini 3 Flash (Google). All models use temperature 0 for deterministic sampling.