(no title)
freshtake | 6 months ago
A few things to consider:
1. This is one example. How many other attempts did the person try that failed to be useful, accurate, coherent? The author is an OpenAI employee IIUC, so it begs this question. Sora's demos were amazing until you tried it, and realized it took 50 attempts to get a usable clip.
2. The author noted that humans had updated their own research in April 2025 with an improved solution. For cases where we detect signs of superior behavior, we need to start publishing the thought process (reasoning steps, inference cycles, tools used, etc.). Otherwise it's impossible to know whether this used a specialty model, had access to the more recent paper, or in other ways got lucky. Without detailed proof it's becoming harder to separate legitimate findings from marketing posts (not suggesting this specific case was a pure marketing post)
3. Points 1 and 2 would help with reproducibility, which is important for scientific rigor. If we give Claude the same tools and inputs, will it perform just as well? This would help the community understand if GPT-5 is novel, or if the novelty is in how the user is prompting it
hodgehog11|6 months ago
I should know, I've been using LLM thinking models to help brainstorm ideas for stickier proofs. It's been more successful at discovering esoteric entry points than I would like to admit.
bawolff|6 months ago
If you could combine this with automated theorem proving, it wouldn't matter if it was right only 1 out of a 1000 times.
hto2i34334324|6 months ago
(Theory building is quite hard in math; the computation side is only hard after a point).
tshaddox|6 months ago
foobarqux|6 months ago
High chance given that this is the same guy that came up with SVG unicorn (sparks of AGI) which raises the same question even more obviously.
energy123|6 months ago
sothatsit|6 months ago
ds-slope|6 months ago
OtomotO|6 months ago
[deleted]