top | item 47149989

(no title)

impossiblefork | 5 days ago

I don't think it's a common thing in any public LLM benchmarks or in any standard QA datasets. Maybe in internal stuff at AI firms.

discuss

No comments yet.