top | item 45838065

(no title)

nahnahno | 3 months ago

The fact that GPT-4.1 was the judge does not convince of the validity of the bench.

discuss

It’s probably just that they started before gpt 5 was released. It’s a good judge.

tacoooooooo|3 months ago

it's an odd choice. I'd be curious why they picked that. it's not the cheapest, most expensive, best, or worst.

It does have a relatively large context window, and ime is very good at format adherence

lukaslevert|3 months ago

You may be looking at our first benchmarks on the homepage— the latest ones for the Search API were conducted against GPT-5: https://parallel.ai/blog/introducing-parallel-search