top | item 45838065

(no title)

nahnahno | 3 months ago

The fact that GPT-4.1 was the judge does not convince of the validity of the bench.

discuss

order

ripped_britches|3 months ago

It’s probably just that they started before gpt 5 was released. It’s a good judge.

tacoooooooo|3 months ago

it's an odd choice. I'd be curious why they picked that. it's not the cheapest, most expensive, best, or worst.

It does have a relatively large context window, and ime is very good at format adherence