top | item 46992634

(no title)

sinuhe69 | 17 days ago

I'm pretty certain that DeepMind (and all other labs) will try their frontier (and even private) models on First Proof [1].

And I wonder how Gemini Deep Think will fare. My guess is that it will get half the way on some problems. But we will have to take an absence as a failure, because nobody wants to publish a negative result, even though it's so important for scientific research.

[1] https://1stproof.org/

discuss

octoberfranklin|17 days ago

Really surprised that 1stproof.org was submitted three times and never made front page at HN.

https://hn.algolia.com/?q=1stproof

This is exactly the kind of challenge I would want to judge AI systems based on. It required ten bleeding-edge-research mathematicians to publish a problem they've solved but hold back the answer. I appreciate the huge amount of social capital and coordination that must have taken.

I'm really glad they did it.

lofaszvanitt|16 days ago

Of course it isn't made the front page. If something is promising they hunt it down, and when conquered they post about it. Lot of times the new category has much better results, than the default HN view.

blinding-streak|16 days ago

As a non-mathematician, reading these problems feels like reading a completely foreign language.

https://arxiv.org/html/2602.05192v1

ky3|16 days ago

LLM to the rescue. Feed in a problem and ask it to explain it to a layperson. Also feed in sentences that remain obscure and ask to unpack.

zozbot234|17 days ago

The 1st proof original solutions are due to be published in about 24h, AIUI.

energy123|17 days ago

Feels like an unforced blunder to make the time window so short after going to so much effort and coming up with something so useful.