top | item 46561268

(no title)

Based on Tao’s description of how the proof came about - a human is taking results backwards and forwards between two separate AI tools and using an AI tool to fill in gaps the human found?

I don’t think it can really be said to have occurred autonomously then?

Looks more like a 50/50 partnership with a super expert human one the one side which makes this way more vague in my opinion - and in line with my own AI tests, ie. they are pretty stupid even OPUS 4.5 or whatever unless you're already an expert and is doing boilerplate.

EDIT: I can see the title has been fixed now from solved to "more or less solved" which is still think is a big stretch.

discuss

D-Machine|1 month ago

You're understanding correctly, this is back and forth between Aristotle and ChatGPT and a (very smart) user.

MyFirstSass|1 month ago

I'm not sure i understand the wild hype here in this thread then.

Seems exactly like the tests at my company where even frontier models are revealed to be very expensive rubber ducks, but completely fails with non experts or anything novel or math heavy.

Ie. they mirror the intellect of the user but give you big dopamine hits that'll lead you astray.

adityaathalye|1 month ago

Exactly "The Geordi LaForge Paradox" of "AI" systems. The most sophisticated work requires the most sophisticated user, who can only become sophisticated the usual way --- long hard work, trial and error, full-contact kumite with reality, and a degree of devotion to the field.

NooneAtAll3|1 month ago

https://www.erdosproblems.com/forum/thread/728#post-2808

> There seems to be some confusion on this so let me clear this up. No, after the model gave its original response, I then proceeded to ask it if it could solve the problem with C=k/logN arbitrarily large. It then identified for itself what both I and Tao noticed about it throwing away k!, and subsequently repaired its proof. I did not need to provide that observation.

so it was literally "yo, your proof is weak!" - "naah, watch this! [proceeds to give full proof all on its own]"

I'd say that counts

jasonfarnon|1 month ago

I had the impression Tao/community weren't even finding the gaps, since they mentioned using an automatic proof verifier. And that the main back and forth involved re-reading Erdos' paper to find out the right problem Erdos intended. So more like 90/10 LLM/human. Maybe I misread it.

NewsaHackO|1 month ago

This is what I got from Tao's post as well.

dpacmittal|1 month ago

There's a lot more detail in this reddit post from the author - https://www.reddit.com/r/OpenAI/comments/1q6yw5g/how_we_used...

Tenobrus|1 month ago

strongly think you should go read the thread to get a sense of the level of expertise and amount of effort put in by the humans involved: https://www.erdosproblems.com/forum/thread/728#post-2852

naasking|1 month ago

> EDIT: I can see the title has been fixed now from solved to "more or less solved" which is still think is a big stretch.

"solved more or less autonomously by AI" were Tao's exact words, so I think we can trust his judgment about how much work he or the AI did, and how this indicates a meaningful increase in capabilities.

unknown|1 month ago

[deleted]

mmphosis|1 month ago

This website was made by Thomas Bloom, a mathematician who likes to think about the problems Erdős posed. Technical assistance with setting up the code for the website was provided by ChatGPT -from the FAQ

Davidzheng|1 month ago

Do you need to be a super expert to find gaps in proofs? Debatable

Yeask|1 month ago

Is a good economic decision to hype a bit the importance of the LLM$.