top | item 46042281 Claude Opus 4.5, and why evaluating new LLMs is increasingly difficult 1 points| gingersnap | 3 months ago |simonwillison.net 1 comment order hn newest ChrisArchitect|3 months ago More discussion: https://news.ycombinator.com/item?id=46037637
ChrisArchitect|3 months ago