top | item 45006472

(no title)

freshtake | 6 months ago

An interesting debate!

A few things to consider:

1. This is one example. How many other attempts did the person try that failed to be useful, accurate, coherent? The author is an OpenAI employee IIUC, so it begs this question. Sora's demos were amazing until you tried it, and realized it took 50 attempts to get a usable clip.

2. The author noted that humans had updated their own research in April 2025 with an improved solution. For cases where we detect signs of superior behavior, we need to start publishing the thought process (reasoning steps, inference cycles, tools used, etc.). Otherwise it's impossible to know whether this used a specialty model, had access to the more recent paper, or in other ways got lucky. Without detailed proof it's becoming harder to separate legitimate findings from marketing posts (not suggesting this specific case was a pure marketing post)

3. Points 1 and 2 would help with reproducibility, which is important for scientific rigor. If we give Claude the same tools and inputs, will it perform just as well? This would help the community understand if GPT-5 is novel, or if the novelty is in how the user is prompting it

discuss

hodgehog11|6 months ago

I don't mean to be cynical, but I don't think these points matter as much as you think, at least not in practice. The hardest part of a proof is working out the intermediate steps; joining them up is often trivial, even for a student. So even if it works out a few good steps or finds an effective theorem to apply, and does so only every one in a hundred prompts, the time savings can be significant.

I should know, I've been using LLM thinking models to help brainstorm ideas for stickier proofs. It's been more successful at discovering esoteric entry points than I would like to admit.

bawolff|6 months ago

> This is one example. How many other attempts did the person try that failed to be useful, accurate, coherent? The author is an OpenAI employee IIUC, so it begs this question. Sora's demos were amazing until you tried it, and realized it took 50 attempts to get a usable clip.

If you could combine this with automated theorem proving, it wouldn't matter if it was right only 1 out of a 1000 times.

hto2i34334324|6 months ago

The most difficult part of automated theorem proving is not the "tactic" part, but actually in the formulation.

(Theory building is quite hard in math; the computation side is only hard after a point).

tshaddox|6 months ago

Perhaps 1/1000 would be a useful rate, but numbers go a lot smaller than 1/1000.

foobarqux|6 months ago

> This is one example. How many other attempts did the person try that failed to be useful, accurate, coherent?

High chance given that this is the same guy that came up with SVG unicorn (sparks of AGI) which raises the same question even more obviously.

energy123|6 months ago

4. How many times has this happened already but the human took credit for the output because they don't have the incentive to give credit to the LLM

sothatsit|6 months ago

I'd say a lot of people even have an incentive to not give credit to the LLMs, because there is a social stigma associated with using AI, due to its association with low-quality work.

ds-slope|6 months ago

I don’t think it’s that they don’t have the incentive. I think it’s because it’s unclear if you give credit to the LLM if that means that OpenAI or similar would be considered an author in which case that could really screw up intellectual property and make using LLMs much less attractive. If the LLM wants attribution then it’s sentient, and if it’s sentient, it may be given personhood (Johnny-five scenario) and get rights, and then it would be a writer, and it could influence the license and intellectual property may belong partially to it unless it willingly became and employee of a ton of companies and organizations or contracted with them.

OtomotO|6 months ago

[deleted]