top | item 45838065 (no title) nahnahno | 3 months ago The fact that GPT-4.1 was the judge does not convince of the validity of the bench. discuss order hn newest ripped_britches|3 months ago It’s probably just that they started before gpt 5 was released. It’s a good judge. tacoooooooo|3 months ago it's an odd choice. I'd be curious why they picked that. it's not the cheapest, most expensive, best, or worst.It does have a relatively large context window, and ime is very good at format adherence lukaslevert|3 months ago You may be looking at our first benchmarks on the homepage— the latest ones for the Search API were conducted against GPT-5: https://parallel.ai/blog/introducing-parallel-search
ripped_britches|3 months ago It’s probably just that they started before gpt 5 was released. It’s a good judge.
tacoooooooo|3 months ago it's an odd choice. I'd be curious why they picked that. it's not the cheapest, most expensive, best, or worst.It does have a relatively large context window, and ime is very good at format adherence lukaslevert|3 months ago You may be looking at our first benchmarks on the homepage— the latest ones for the Search API were conducted against GPT-5: https://parallel.ai/blog/introducing-parallel-search
lukaslevert|3 months ago You may be looking at our first benchmarks on the homepage— the latest ones for the Search API were conducted against GPT-5: https://parallel.ai/blog/introducing-parallel-search
ripped_britches|3 months ago
tacoooooooo|3 months ago
It does have a relatively large context window, and ime is very good at format adherence
lukaslevert|3 months ago