top | item 44827159

(no title)

> Not much explanation yet why GPT-5 warrants a major version bump

Exactly. Too many videos - too little real data / benchmarks on the page. Will wait for vibe check from simonw and others

discuss

collinmanderson|6 months ago

> Will wait for vibe check from simonw

https://openai.com/gpt-5/?video=1108156668

2:40 "I do like how the pelican's feet are on the pedals." "That's a rare detail that most of the other models I've tried this on have missed."

4:12 "The bicycle was flawless."

5:30 Re generating documentation: "It nailed it. It gave me the exact information I needed. It gave me full architectural overview. It was clearly very good at consuming a quarter million tokens of rust." "My trust issues are beginning to fall away"

Edit: ohh he has blog post now: https://news.ycombinator.com/item?id=44828264

bardak|6 months ago

I feel like we need to move on from using the same test on models since as time goes on the information about these specific test is out there in the training data and while i am not saying that it's happened in this case there is nothing stopping model developers from adding extra data for theses tests directly in the training data to make their models seem better than they are

dimitri-vs|6 months ago

This effectively kills this benchmark.

laurent_du|6 months ago

Damn Theo is really a handsome dude.