LLMs Are Currently Not Helpful at All for Math Research: Hamkins

[+] demirbey05|2 months ago|reply

It's hard to cut through the AI hype when there are billions of dollars at stake. I usually trust negative comments more, as long as the person isn't trying to sell a course. Even though Terence Tao is a respected scientist, I wonder if his recent comments are driven by a need for funding due to federal cuts. I’ve had similar experiences with LLMs—whenever I ask them about hard math or RL theory, they almost always give me the wrong answers.

[+] ben_w|2 months ago|reply

I also care more about the failure modes than the successes, although in my case, it's because I keep finding them exceptionally useful at software development, and I:

1. Don't want to use them where they suck.

Think normalisation of deviance: "the problems haven't affected me therefore they don't exist" is a way to get really badly burned.

2. Want to train up in things they will still suck at by the time I've leared whatever it is.

I find LLMs seem kinda bad at writing sheet music, and Suno is kinda bad at werid instructions (like Stable Diffusion for images), but I expect them to get good before I can.

I also find them inconsistent at non-Euclidian problems: sometimes they can, sometimes they can't. I have absolutely no idea how to monetise that, but even if I could, "inconsistent" is itself an improvement on "cannot ever" which is what SOTA was a few years ago.

[+] j16sdiz|2 months ago|reply

> The frustration, for Hamkins, goes beyond mere incorrectness—it’s the nature of the interaction itself that proves problematic.

I would assume pairing LLMs with a formal proof system would help a lot. At the very least, the system can know what is incorrect, without lengthy debates, which frustrate him most.

This won't help the system discover or solve new math, but it make the experience far more endurable.

[+] 112233|2 months ago|reply

That itself may be tricky. Suppose you proof system tells that the proof is correct — how do you verify it is proof of the assertion you want, unless you have beforehand written the "test harness" by hand? At least in programming, formally checking that the code does exactly what is required is orders of magnitude harder than simply writing code that compiles.

[+] MaysonL|2 months ago|reply

Terry Tao has found them helpful in the past[0], and I’d trust him over Hamkins.

[0] https://terrytao.wordpress.com/2025/12/08/the-story-of-erdos...

[+] measurablefunc|2 months ago|reply

They do different kinds of mathematics. Tao works mainly in analytic number theory & that's why he got a Fields medal. Hamkins works in foundations of mathematics, logic, & computability theory.

[+] pfdietz|2 months ago|reply

Tao has also used "Good Old Fashioned AI" (in the form of Mace4/Prover9) for math research.

https://terrytao.wordpress.com/2025/12/09/the-equational-the...

[+] watchthedemo|2 months ago|reply

For the sake of rigor it appears he found it interesting that someone else found them helpful in the past rather than "Terry Tao[himself] has found them helpful in the past."

> This was one further addition to a recent sequence of examples where an Erdős problem had been automatically solved in one fashion or another by an AI tool.[Aristotle] Like the previous cases, the proof turned out to not be particularly novel

He concludes:

> One striking feature of this story for me is how important it was to have a diverse set of people, literature, and tools to attack this problem. To be able to state and prove the precise formula for {c(n)} required multiple observations

Which looks like he wants to focus the praise for the solution on the "diverse set of people" relegating any "praise" for LLMs to the obfuscated and ambiguous category of "tools" behind literature.

[+] sph|2 months ago|reply

Isaac Newton was deep into alchemy and occult. Even very smart people suffer from confirmation bias, Dunning-Kruger effect & co.

[+] NitpickLawyer|2 months ago|reply

> “I’ve played around with it and I’ve tried experimenting, but I haven’t found it helpful at all. Basically zero. It’s not helpful to me. And I’ve used various systems and so on, the paid models and so on.”

Eh... I don't know. It's hard for me to believe such absolutist takes. Especially since other proeminent mathematicians (i.e. Tao) have spoken highly of the help and value they get from these systems.

I also see this kind of takes in my field. Especially from "senior" people, 20+ years of experience. The problem is that, when pressed, their "trying it out" is often the most basic, naive, technically atrocious type of testing. "I tried coding with it but it's useless" -> they tried once, 3 years ago, in a chatgpt web interface, and never touched it since.

I think there's a lot of disconnect between the overhypers and deniers. Both are wrong, IMO, and unfortunately both are extremely vocal while also being stubborn and unwilling to try different things. But my gut feeling is that if in 2025/26 someone says "basically zero" they are wrong. Or naive. Or make strawman arguments. And ignorable.

[+] mcv|2 months ago|reply

I've been using it very recently. Not for mathematics , but for programming. And while Claude Opus is much more likely to admit mistake ("You're absolutely correct!" instead of "That's fine") when I correct it, it does require correcting, and has been incapable at grasping complex problems. I can't trust it to produce correct code for complex problems, because when I did, the solution turned out to be wrong. Plausible looking, and certainly producing some results, but not the correct ones.

It has still been useful, because parts of the problem have been solved before, and in some cases it can accurately reproduce those parts, which has been massively helpful. And it can help me understand some aspects of the problem. But it can't solve it for me. It has shown that that's simply beyond its capabilities.

[+] onetokeoverthe|2 months ago|reply

[deleted]

[+] 20kHz|2 months ago|reply

Well folks, Tao says he finds AI useful so I guess we can flag this and other stories that say anything to the contrary, pack it up and go home.

[+] pfdietz|2 months ago|reply

Not only said, but demonstrated. That counts for quite a lot.

[+] erelong|2 months ago|reply

As the kids would say, sounds like a "skill issue", but more kindly just that this user hasn't developed a use for the tools for his specific use cases, which is fine; however an implication that the tools aren't useful for other people seems to contradict others' experience reports on how they have been able to find uses for the tools

[+] jupitr|2 months ago|reply

This is quite seriously missing the point: computational theorem-provers and contributions to mathlib (for Lean, anyway) allow for checking "correctness". The LLM is one of the many ways to search for tactics that help complete a proof for a theorem not yet in mathlib.

The LLM is not done until the theorem-prover says it is done - at which point we can be sure of the correctness of the assertion in the context. AFAIK there is no debate about inaccuracy with theorem-provers. Their real limitation continues to be converging in realistic timescales and efficient search over all combinations of proof tactics.

[+] watchthedemo|2 months ago|reply

> The LLM is not done until the theorem-prover says it is done - at which point we can be sure of the correctness of the assertion in the context. AFAIK there is no debate about inaccuracy with theorem-provers.

Your "in the context" is doing a lot of heavy lifting here, see:

- The Math Is Haunted : https://news.ycombinator.com/item?id=44739315 : https://overreacted.io/the-math-is-haunted/

- Three ways formally verified code can go wrong in practice : https://news.ycombinator.com/item?id=45555727 : https://buttondown.com/hillelwayne/archive/three-ways-formal...

- Breaking “provably correct” Leftpad : https://news.ycombinator.com/item?id=45492274 : https://lukeplant.me.uk/blog/posts/breaking-provably-correct...

- what does a lean proof prove? : https://news.ycombinator.com/item?id=46286605 : https://supaiku.com/what-does-a-lean-proof-prove

> Their real limitation continues to be converging in realistic timescales and efficient search over all combinations of proof tactics.

I agree "efficient search over all combination of proof tactics" will be a defining moment in the field. I disagree these LLMs are it.

[+] imtringued|2 months ago|reply

For me personally LLMs have made it way easier to discover obscure or unfamiliar math concepts. It's way harder to get into a topic if you have to read through hundreds of pages and then wonder if it was time well spent, compared to just finding the bits that are interesting to you and then getting deeper into the topic as you need.

[+] panny|2 months ago|reply

"It is difficult to get a man to understand something when his salary depends on his not understanding it."

Well, this definitely won't make the front page despite upvotes. Too many people here are heavily invested and cannot stand for it.

26 comments