top | new | best | ask | show | jobs

top | item 45886250

Measuring What Matters: Construct Validity in Large Language Model Benchmarks

1 points| Cynddl | 3 months ago |arxiv.org

discuss

order

No comments yet.

powered by hn/api // news.ycombinator.com