top | item 47051075

(no title)

AdamConwayIE | 13 days ago

There aren't really any of the typical benchmark suites targeting Codex 5.3 because it's still not in the API.

SWE bench for example creates a predictions file and evaluates the results in the harness. Without Codex 5.3 being in the API, it can't.

discuss

order

No comments yet.