- Aristotle uses modern AI techniques heavily, including language modeling.
- Aristotle can be guided by an informal (English) proof. If the proof is correct, Aristotle has a good chance at translating it into Lean (which is a strong vote of confidence that your English proof is solid). I believe that's what happened here.
- Once a proof is formalized into Lean (assuming you have formalized the statement correctly), there is no doubt that the proof is correct. This is the core of our approach: you can do a lot of (AI-driven) search, and once you find the answer you are certain it's correct no matter how complex the solution is.
How do you verify that the AI translation to Lean is a correct formalization of the problem? In other fields, generative AI is very good at making up plausible sounding lies, so I'm wondering how likely that is for this usage.
Sometimes when I'm using new LLMs I'm not sure if it’s a step forward or just benchmark hacking, but formalized math results always show that the progress is real and huge.
When do you think Harmonic will reach formalizing most (even hard) human written math?
I saw an interview with Christian Szegedy (your competitor I guess) that he believes it will be this year.
Is anyone working on applying these techniques to formal verification of software?
My limited understanding of Rust is that it applies a fixed set of rules to guarantee memory safety. The rules are somewhat simple and limiting, for ease of understanding and implementation, but also because of undecidability.
Programmers run into situations where they know that their code won't cause memory errors, but it doesn't follow the rules. Wouldn't it be cool if something like Aristotle was integrated into the compiler? Any code for which a proof of correctness could be written would pass/compile, without having to add more and more rules
This is the forst time I heard about Aristotle and find it very interesting. First question first: is it available for the general public? I don't know if this is the page to try it? [1]
Second, when you say language modeling support, it means that can better understand code representation (ASTs) or something else? I am just an AI user, not very knowledgeable in the field. My main interest is if it would be great for static analysis oriented to computer security (SAST).
> If the proof is correct, Aristotle has a good chance at translating it into Lean
How does this depend on the area of mathematics of the proof? I was under the impression that it was still difficult to formalize most research areas, even for a human. How close is Aristotle to this frontier?
>assuming you have formalized the statement correctly
That's a pretty big assumption, though, isn't it? As we saw the Navier-Stokes psychosis episode over the New Year holiday, formalizing correctly really isn't guaranteed.
What occurs when this process is reversed - translate from lean to informal english, and does iterating this then help research better approaches toward writing proofs in human language?
You seem to be openly contradicting your company's PR and language. Your description very clearly describes the "AI" as a tool to translate relatively informal specifications into formal proof logic, but does not itself do the proving.
Based on Tao’s description of how the proof came about - a human is taking results backwards and forwards between two separate AI tools and using an AI tool to fill in gaps the human found?
I don’t think it can really be said to have occurred autonomously then?
Looks more like a 50/50 partnership with a super expert human one the one side which makes this way more vague in my opinion - and in line with my own AI tests, ie. they are pretty stupid even OPUS 4.5 or whatever unless you're already an expert and is doing boilerplate.
EDIT: I can see the title has been fixed now from solved to "more or less solved" which is still think is a big stretch.
I had the impression Tao/community weren't even finding the gaps, since they mentioned using an automatic proof verifier. And that the main back and forth involved re-reading Erdos' paper to find out the right problem Erdos intended. So more like 90/10 LLM/human. Maybe I misread it.
> EDIT: I can see the title has been fixed now from solved to "more or less solved" which is still think is a big stretch.
"solved more or less autonomously by AI" were Tao's exact words, so I think we can trust his judgment about how much work he or the AI did, and how this indicates a meaningful increase in capabilities.
This website was made by Thomas Bloom, a mathematician who likes to think about the problems Erdős posed. Technical assistance with setting up the code for the website was provided by ChatGPT -from the FAQ
Reconfiguring existing proofs in ways that have been tedious or obscured from humans, or using well framed methods in novel ways, will be done at superhuman speeds, and it'll unlock all sorts of capabilities well before we have to be concerned about AGI. It's going to be awesome to see what mathematicians start to do with AI tools as the tools become capable of truly keeping up with what the mathematicians want from the tools. It won't necessarily be a huge direct benefit for non-mathematicians at first, because the abstract and complex results won't have direct applications, but we might start to see millenium problems get taken down as legitimate frontier model benchmarks.
Or someone like Terence Tao might figure out how to wield AI better than anyone else, even the labs, and use the tools to take a bunch down at once. I'm excited to see what's coming this year.
I agree only with the part about reconfiguring existing proofs. That's the value here. It is still likely very tedious to confirm what the LLMs say, but at least it's better than waiting for humans to do this half of the work.
For all topics that can be expressed with language, the value of LLMs is shuffling things around to tease out a different perspective from the humans reading the output. This is the only realistic way to understand AI enough to make it practical and see it gain traction.
As much as I respect Tao, I feel like his comments about AI usage can be misleading without carefully reading what he is saying in the linked posts.
> Reconfiguring existing proofs in ways that have been tedious or obscured from humans,
To a layman, that doesn't sound like very AI-like? Surely there must be a dozen algorithms to effectively search this space already, given that mathematics is pretty logical?
This is what has excited me for many years - the idea I call "scientific refactoring"
What happens if we reason upwards but change some universal constants? What happens if we use Tao instead of Pi everywhere, these kind of fun questions would otherwise require an enormous intellectual effort whereas with the mechanisation and automation of thought, we might be able to run them and see!
If this isn't AGI, what is? It seems unavoidable that an AI which can prove complex mathematical theorems would lead to something like AGI very quickly.
Can anyone with specific knowledge in a sophisticated/complex field such as physics or math tell me: do you regularly talk to AI models? Do feel like there's anything to learn? As a programmer, I can come to the AI with a problem and it can come up with a few different solutions, some I may have thought about, some not.
Are you getting the same value in your work, in your field?
Context: I finished a PhD in pure math in 2025 and have transitioned to being a data scientist and I do ML/stats research on the side now.
For me, deep research tools have been essential for getting caught up with a quick lit review about research ideas I have now that I'm transitioning fields. They have also been quite helpful with some routine math that I'm not as familiar with but is relatively established (like standard random matrix theory results from ~5 years ago).
It does feel like the spectrum of utility is pretty aligned with what you might expect: routine programming > applied ML research > stats/applied math research > pure math research.
I will say ~1 year ago they were still useless for my math research area, but things have been changing quickly.
I don't have a degree in either physics or math, but what AI helps me to do is to stay focused on the job before me rather than to have to dig through a mountain of textbooks or many wikipedia pages or scientific papers trying to find an equation that I know I've seen somewhere but did not register the location of and did not copy down. This saves many days, every day. Even then I still check the references once I've found it because errors can and do slip into anything these pieces of software produce, and sometimes quite large ones (those are easy to spot though).
So yes, there is value here, and quite a bit but it requires a lot of forethought in how you structure your prompts and you need to be super skeptical about the output as well as able to check that output minutely.
If you would just plug in a bunch of data and formulate a query and would then use the answer in an uncritical way you're setting yourself up for a world of hurt and lost time by the time you realize you've been building your castle on quicksand.
I do / have done research in building deep learning models and custom / novel attention layers, architectures, etc., and AI (ChatGPT) is tremendously helpful in facilitating (semantic) search for papers in areas where you may not quite know the magic key words / terminology for what you are looking for. It is also very good at linking you to ideas / papers that you might not have realized were related.
I also found it can be helpful when exploring your mathematical intuitions on something, e.g. like how a dropout layer might effect learned weights and matrix properties, etc. Sometimes it will find some obscure rigorous math that can be very enlightening or relevant to correcting clumsy intuitions.
I'm an active researcher in TCS. For me, AI has not been very helpful on technical things (or even technical writing), but has been super helpful for (1) literature reviews; (2) editing papers (e.g., changing a convention everywhere in the paper); and (3) generating Tikz figures/animations.
I talk to them (math research in algebraic geometry) not really helpful outside of literature search unfortunately. Others around me get a lot more utility so it varies. (Most powerful model i tried was Gemini 2.5 deep think and Gemini 3.0 pro) not sure if the new gpts are much better
I did a theoretical computer science PhD a few years ago and write one or two papers a year in industry. I have not had much success getting models to come up with novel ideas or even prove theorems, but I have had some success asking them to prove smaller and narrower results and using them as an assistant to read papers (why are they proving this result, what is this notation they're using, expand this step of their proof, etc). Asking it to find any bugs in a draft before Arxiving also usually turns up some minor things to clarify.
Overall: useful, but not yet particularly "accelerating" for me.
I work in quantum computing. There is quite a lot of material about quantum computing out there that these LLMs must have been trained on. I have tried a few different ones, but they all start spouting nonsense about anything that is not super basic.
But maybe that is just me. I have read some of Terence Tao's transcripts, and the questions he asks LLMs are higher complexity than what I ask. Yet, he often gets reasonable answers. I don't yet know how I can get these tools to do better.
I’m a hobbyist math guy (with a math degree) and LLMs can at least talk a little talk or entertain random attempts at proofs I make. In general they rebuke my more wild attempts, and will lead me to well-trodden answers for solved problems. I generally enjoy (as a hobby) finding fun or surprising solutions to basic problems more than solving novel maths, so LLMs are fun for me.
As the other person said, Deep Research is invaluable; but generating hypotheses is not as good at the true bleeding edge of the research. The ChatGPT 4.0 OG with no guardrails, briefly generated outrageously amazing hypotehses that actually made sense. After that they have all been neutered beyond use in this direction.
My experience has been mixed. Honestly though, talking to AI and discussing a problem with it is better than doing nothing and just procrastinating. It's mostly wrong, but the conversation helps me think. In the end, once my patience runs out and my own mind has been "refreshed" through the conversation (even if it was frustrating), I can work on it myself. Some bits of the conversation will help but the "one-shot" doesn't exist. tldr: ai chatbots can get you going, and may be better than just postponing and procrastinating over the problem you're trying to solve.
- Minor nit: The documentation mentions "uvx aristotlelib@latest aristotle" but that doesn't work; it should be "uvx --from aristotlelib@latest aristotle"
- It took me a minute or two of clicking around to figure out that the (only?) way to use it is to create an API key, then start aristotle in the terminal and interact with it there. It could be more obvious I think.
Edit: I just realized from https://news.ycombinator.com/item?id=46296801 that you're the CEO! - in that case maybe you, or whoever you think most appropriate from your organization, could submit it along with a text description of what it is, and what is the easiest and/or most fun way to try it out?
This is great, there is still so much potential in AI once we move beyond LLMs to specialized approaches like this.
EDIT: Look at all the people below just reacting to the headline and clearly not reading the posts. Aristotle (https://arxiv.org/abs/2510.01346) is key here folks.
EDIT2: It is clear much of the people below don't even understand basic terminology. Something being a transformer doesn't make it an LLM (vision transformers, anyone) and if you aren't training on language (e.g. AlphaFold, or Aristotle on LEAN stuff), it isn't a "language" model.
> It is clear much of the people below don't even understand basic terminology. Something being a transformer doesn't make it an LLM (vision transformers, anyone) and if you aren't training on language (e.g. AlphaFold, or Aristotle on LEAN stuff), it isn't a "language" model.
I think it's because it comes off as you are saying that we should move off of GenAI, and alot of people use LLM when they mean GenAI.
1. "The search algorithm is a highly parallel Monte Carlo Graph Search (MCGS) using a large transformer as its policy and value functon." ... "We use a generative policy to take progressively widened [7] samples from the large action space of Lean tactics, conditioning on the Lean proof state, proof history, and, if available, an informal proof. We use the same model and prompt (up to a task token) to compute the value function which guides the search."
See that 'large transformer' phrase? That's where the LLM is involved.
2. "A lemma-based informal reasoning system which generates informal proofs of mathematical state-ments, breaks these proofs down into lemmas, formalizes each lemma into Lean, and iterates this process based on formal feedback" ... "First, the actions it generates consist of informal comments in addition to Lean tactics. Second, it uses a hidden chain of thought with a dynamically set thinking budget before predicting an action."
Unless you're proposing that this team solved AGI, "chain of thought" is a specific term of art in LLMs.
3. "A geometry solver which solves plane geometry problems outside of Lean using an approach based on AlphaGeometry [45]." ... following the reference: "AlphaGeometry is a neuro-symbolic system that uses a neural language model, trained from scratch on our large-scale synthetic data, to guide a symbolic deduction engine through infinite branching points in challenging problems. "
AlphaGeometry, like all of Deepmind's Alpha tools, is an LLM.
Instead of accusing people of not reading the paper, perhaps you should put some thought into what the things in the paper actually represent.
Very cool to see how far things have come with this technology!
Please remember that this is a theorem about integers that is subject to a fairly elementary proof that is well-supported by the existing Mathlib infrastructure. It seems that the AI relies on the symbolic proof checker, and the proofs that it is checking don't use very complex definitions in this result. In my experience, proofs like this which are one step removed from existing infra are much much more likely to work.
2026 should be interesting. This stuff is not magic, and progress is always going to be gradual with solutions to less interesting or "easier" problems first, but I think we're going to see more milestones like this with AI able to chip away around the edges of unsolved mathematics. Of course, that will require a lot of human expertise too: even this one was only "solved more or less autonomously by AI (after some feedback from an initial attempt)".
People are still going to be moving the goalposts on this and claiming it's not all that impressive or that the solution must have been in the training data or something, but at this point that's kind of dubiously close to arguing that Terence Tao doesn't know what he's talking about, which to say the least is a rather perilous position.
At this point, I think I'm making a belated New Years resolution to stop arguing with people who are still staying that LLMs are stochastic parrots that just remix their training data and can never come up with anything novel. I think that discussion is now dead. There are lots of fascinating issues to work out with how we can best apply LLMs to interesting problems (or get them to write good code), but to even start solving those issues you have to at least accept that they are at least somewhat capable of doing novel things.
In 2023 I would have bet hard against us getting to this point ("there's no way chatbots can actually reason their way through novel math!"), but here we are are three years later. I wonder what comes next?
Uh, this was exactly a "remix" of similar proofs that most likely were in the training data. It's just that some people misunderestimate how compelling that "remix" ability can be, especially when paired with a direct awareness of formal logical errors in one's attempted proof and how they might be addressed in the typical case.
The goalposts are still the same. We want to be able to independently verify that an AI can do something instead of just hearing such a claim from a corporation that is absolutely willing to lie through their teeth if it gets them money.
Digging through the PDFs on Google Drive, this seems to be (one of) the generated proofs. I may be misunderstanding something, but 1400 lines of AI-generated code seems a very good place for some mistake in the translation to sneak in https://github.com/plby/lean-proofs/blob/main/src/v4.24.0/Er...
Though I suppose if the problem statement in Lean is human-generated and there are no ways to "cheat" in a Lean proof, the proof could be trusted without understanding it
Much of the discussion here seems focused on the Lean part/correctness, but it sure looks like for Tao its the rapid iteration on the _paper_ that's the important part:
> ... to me, the more interesting capability revealed by these events is the ability to rapidly write and rewrite new versions of a text as needed, even if one was not the original author of the argument.
> This is sharp contrast to existing practice where the effort required to produce even one readable manuscript is quite time-consuming, and subsequent revisions (in response to referee reports, for instance) are largely confined to local changes (e.g., modifying the proof of a single lemma), with large-scale reworking of the paper often avoided due both to the work required and the large possibility of introducing new errors. However, the combination of reasonably competent AI text generation and modification capabilities, paired with the ability of formal proof assistants to verify the informal arguments thus generated, allows for a much more dynamic and high-multiplicity conception of what a writeup of an argument is, with the ability for individual participants to rapidly create tailored expositions of the argument at whatever level of rigor and precision is desired.
Of course this implies that the math works which is the Aristotle part, and that's great ... but this rebuts the "but this isn't AI by itself, this is AI and a bunch of experts working hard, nothing to see here": right, well even "experts working hard" fail to iterate on the paper which significantly hinders research progress.
I remember seeing a documentary where there was a bit about some guy who' life's work was computing pi to 30 digits. Imagine all that time to do what my computer can do in less than a second + a day or two to write the code using the algorithm he used. 10 min if you use newton's
When Deep Blue beat Kaspaorov, it was not the end of career for human players. But since mathematics is not a sport with human players, what are the career prospects for mathematicians or mathematics-like fields?
1. This result is very far from showing something like "human mathematicians are no longer needed to advance mathematics".
2. Even if it did show that, as long as we need humans trained in understanding maths, since "professional mathematicians" are mostly educators, they probably aren't going anywhere.
Tao's broad project, which he has spoken about a few times, is for mathematics to move beyond the current game of solving individual theorems to being able to make statements about broad categories of problems. So not 'X property is true for this specific magma' but 'X property is true for all possible magmas', as an example I just came up with. He has experimented with this via crowdsourcing problems in a given domain on GitHub before, and I think the implications of how to use AI here are obvious.
The erdos problem website tells the theorem is formalized in Lean but on the mathlib project there is just the theorem statement with a sorry. Does someone know where I can find the lean proof? I don't know maybe it's in some random pull request I didn't find.
There is an ongoing effort to formalize a modern, streamlined proof of FLT in Lean, with all the needed prereqs. It's estimated that it will take approx. 5 years, but perhaps AI will lead to some meaningful speedup.
If AI can rewrite and formalize proofs this way, do we risk losing the human intuition behind the arguments? Or is it just a tool to explore math faster?
It classifies the advancements based on the level of AI input. In particular, the entry in Table 1 related to the original post has both a green and yellow light, reflecting the skepticism from others.
We need you to stop posting shallow dismissals and cynical, curmudgeonly, and snarky comments.
We asked you about this just recently, but it's still most of what you're posting. You're making the site worse by doing this, right at the point where it's most vulnerable these days.
Your comment here is a shallow dismissal of exactly the type the HN guidelines ask users to avoid here:
Predictably, it led to by far the worst subthread on this article. That's not cool. I don't want to ban you because you're also occasionally posting good comments that don't fit these negative categories, but we need you to fix this and stop degrading the threads.
Whether powered by human or computer, it is usually easier (and requires far fewer resources) to verify a specific proof than to search for a proof to a problem.
> ... Also, I would not put it past OpenAI to drag up a similar proof using ChatGPT, refine it and pretend that ChatGPT found it. ...
That's the best part! They don't even need to, because ChatGPT will happily do its own private "literature search" and then not tell you about it - even Terence Tao has freely admitted as much in his previous comments on the topic. So we can at least afford to be a bit less curmudgeonly and cynical about that specific dynamic: we've literally seen it happen.
This almost implies mathematicians aren’t some ungodly geniuses if something as absolutely dumb as an LLM can solve these problems via blind pattern matching.
Meanwhile I can’t get Claude code to fix its own shit to save my life.
There are "ungodly geniuses" within mathematics but no one is saying every mathematician is an "ungodly genius". The quality of results you get from an LLM can vary greatly depending on the environment you place it in and the context you provide it. This isn't to say it's your fault Claude Code can't fix whatever issue you're having.
As I understand, a lot of mathematics, at least the part about solving problems, is basically back and forth between exploration (which involves pattern matching) and formalising. We've basically solved formalising a while ago, and now LLMs are getting better and better at exploration.
If you think about it, it's also what a lot of other intellectual activity looks like, at least in STEM.
Some comments were deferred for faster rendering.
maxwells-daemon|1 month ago
To clear up a few misconceptions:
- Aristotle uses modern AI techniques heavily, including language modeling.
- Aristotle can be guided by an informal (English) proof. If the proof is correct, Aristotle has a good chance at translating it into Lean (which is a strong vote of confidence that your English proof is solid). I believe that's what happened here.
- Once a proof is formalized into Lean (assuming you have formalized the statement correctly), there is no doubt that the proof is correct. This is the core of our approach: you can do a lot of (AI-driven) search, and once you find the answer you are certain it's correct no matter how complex the solution is.
Happy to answer any questions!
aidenn0|1 month ago
xiphias2|1 month ago
Sometimes when I'm using new LLMs I'm not sure if it’s a step forward or just benchmark hacking, but formalized math results always show that the progress is real and huge.
When do you think Harmonic will reach formalizing most (even hard) human written math?
I saw an interview with Christian Szegedy (your competitor I guess) that he believes it will be this year.
pvillano|1 month ago
My limited understanding of Rust is that it applies a fixed set of rules to guarantee memory safety. The rules are somewhat simple and limiting, for ease of understanding and implementation, but also because of undecidability.
Programmers run into situations where they know that their code won't cause memory errors, but it doesn't follow the rules. Wouldn't it be cool if something like Aristotle was integrated into the compiler? Any code for which a proof of correctness could be written would pass/compile, without having to add more and more rules
wslh|1 month ago
Second, when you say language modeling support, it means that can better understand code representation (ASTs) or something else? I am just an AI user, not very knowledgeable in the field. My main interest is if it would be great for static analysis oriented to computer security (SAST).
[1] https://aristotle.ai/
qnleigh|1 month ago
How does this depend on the area of mathematics of the proof? I was under the impression that it was still difficult to formalize most research areas, even for a human. How close is Aristotle to this frontier?
nateberkopec|1 month ago
That's a pretty big assumption, though, isn't it? As we saw the Navier-Stokes psychosis episode over the New Year holiday, formalizing correctly really isn't guaranteed.
bjt12345|1 month ago
uoaei|1 month ago
jjmarr|1 month ago
pfdietz|1 month ago
EE84M3i|1 month ago
Do you have any links to reading about how often lean core has soundness bugs or mathlib has correctness bugs?
nottorp|1 month ago
Translate an informal description of the proof into this Lean?
101008|1 month ago
MyFirstSass|1 month ago
I don’t think it can really be said to have occurred autonomously then?
Looks more like a 50/50 partnership with a super expert human one the one side which makes this way more vague in my opinion - and in line with my own AI tests, ie. they are pretty stupid even OPUS 4.5 or whatever unless you're already an expert and is doing boilerplate.
EDIT: I can see the title has been fixed now from solved to "more or less solved" which is still think is a big stretch.
D-Machine|1 month ago
jasonfarnon|1 month ago
dpacmittal|1 month ago
Tenobrus|1 month ago
naasking|1 month ago
"solved more or less autonomously by AI" were Tao's exact words, so I think we can trust his judgment about how much work he or the AI did, and how this indicates a meaningful increase in capabilities.
unknown|1 month ago
[deleted]
mmphosis|1 month ago
Davidzheng|1 month ago
Yeask|1 month ago
observationist|1 month ago
Or someone like Terence Tao might figure out how to wield AI better than anyone else, even the labs, and use the tools to take a bunch down at once. I'm excited to see what's coming this year.
Davidzheng|1 month ago
sublinear|1 month ago
For all topics that can be expressed with language, the value of LLMs is shuffling things around to tease out a different perspective from the humans reading the output. This is the only realistic way to understand AI enough to make it practical and see it gain traction.
As much as I respect Tao, I feel like his comments about AI usage can be misleading without carefully reading what he is saying in the linked posts.
xorcist|1 month ago
To a layman, that doesn't sound like very AI-like? Surely there must be a dozen algorithms to effectively search this space already, given that mathematics is pretty logical?
malux85|1 month ago
What happens if we reason upwards but change some universal constants? What happens if we use Tao instead of Pi everywhere, these kind of fun questions would otherwise require an enormous intellectual effort whereas with the mechanisation and automation of thought, we might be able to run them and see!
ComplexSystems|1 month ago
svat|1 month ago
pama|1 month ago
lwansbrough|1 month ago
Are you getting the same value in your work, in your field?
ceh123|1 month ago
For me, deep research tools have been essential for getting caught up with a quick lit review about research ideas I have now that I'm transitioning fields. They have also been quite helpful with some routine math that I'm not as familiar with but is relatively established (like standard random matrix theory results from ~5 years ago).
It does feel like the spectrum of utility is pretty aligned with what you might expect: routine programming > applied ML research > stats/applied math research > pure math research.
I will say ~1 year ago they were still useless for my math research area, but things have been changing quickly.
jacquesm|1 month ago
So yes, there is value here, and quite a bit but it requires a lot of forethought in how you structure your prompts and you need to be super skeptical about the output as well as able to check that output minutely.
If you would just plug in a bunch of data and formulate a query and would then use the answer in an uncritical way you're setting yourself up for a world of hurt and lost time by the time you realize you've been building your castle on quicksand.
D-Machine|1 month ago
I also found it can be helpful when exploring your mathematical intuitions on something, e.g. like how a dropout layer might effect learned weights and matrix properties, etc. Sometimes it will find some obscure rigorous math that can be very enlightening or relevant to correcting clumsy intuitions.
randomizedalgs|1 month ago
Davidzheng|1 month ago
ancillary|1 month ago
Overall: useful, but not yet particularly "accelerating" for me.
abdullahkhalids|1 month ago
But maybe that is just me. I have read some of Terence Tao's transcripts, and the questions he asks LLMs are higher complexity than what I ask. Yet, he often gets reasonable answers. I don't yet know how I can get these tools to do better.
hyperadvanced|1 month ago
ramraj07|1 month ago
kmaitreys|1 month ago
j2kun|1 month ago
tachim|1 month ago
svat|1 month ago
- It took me a minute or two of clicking around to figure out that the (only?) way to use it is to create an API key, then start aristotle in the terminal and interact with it there. It could be more obvious I think.
- Your profile links to http://www.cs.stanford.edu/~tachim/ which doesn't work; should be http://cs.stanford.edu/~tachim/ (without the www) (I think Stanford broke something recently for the former not to work.)
dang|1 month ago
Edit: I just realized from https://news.ycombinator.com/item?id=46296801 that you're the CEO! - in that case maybe you, or whoever you think most appropriate from your organization, could submit it along with a text description of what it is, and what is the easiest and/or most fun way to try it out?
7373737373|1 month ago
unknown|1 month ago
[deleted]
D-Machine|1 month ago
EDIT: Look at all the people below just reacting to the headline and clearly not reading the posts. Aristotle (https://arxiv.org/abs/2510.01346) is key here folks.
EDIT2: It is clear much of the people below don't even understand basic terminology. Something being a transformer doesn't make it an LLM (vision transformers, anyone) and if you aren't training on language (e.g. AlphaFold, or Aristotle on LEAN stuff), it isn't a "language" model.
XCSme|1 month ago
Do you mean that in this case, it was not a LLM?
NewsaHackO|1 month ago
I think it's because it comes off as you are saying that we should move off of GenAI, and alot of people use LLM when they mean GenAI.
stonogo|1 month ago
1. "The search algorithm is a highly parallel Monte Carlo Graph Search (MCGS) using a large transformer as its policy and value functon." ... "We use a generative policy to take progressively widened [7] samples from the large action space of Lean tactics, conditioning on the Lean proof state, proof history, and, if available, an informal proof. We use the same model and prompt (up to a task token) to compute the value function which guides the search."
See that 'large transformer' phrase? That's where the LLM is involved.
2. "A lemma-based informal reasoning system which generates informal proofs of mathematical state-ments, breaks these proofs down into lemmas, formalizes each lemma into Lean, and iterates this process based on formal feedback" ... "First, the actions it generates consist of informal comments in addition to Lean tactics. Second, it uses a hidden chain of thought with a dynamically set thinking budget before predicting an action."
Unless you're proposing that this team solved AGI, "chain of thought" is a specific term of art in LLMs.
3. "A geometry solver which solves plane geometry problems outside of Lean using an approach based on AlphaGeometry [45]." ... following the reference: "AlphaGeometry is a neuro-symbolic system that uses a neural language model, trained from scratch on our large-scale synthetic data, to guide a symbolic deduction engine through infinite branching points in challenging problems. "
AlphaGeometry, like all of Deepmind's Alpha tools, is an LLM.
Instead of accusing people of not reading the paper, perhaps you should put some thought into what the things in the paper actually represent.
nextaccountic|1 month ago
ctoth|1 month ago
[deleted]
markusde|1 month ago
Please remember that this is a theorem about integers that is subject to a fairly elementary proof that is well-supported by the existing Mathlib infrastructure. It seems that the AI relies on the symbolic proof checker, and the proofs that it is checking don't use very complex definitions in this result. In my experience, proofs like this which are one step removed from existing infra are much much more likely to work.
Again though, this is really insanely cool!!
libraryofbabel|1 month ago
People are still going to be moving the goalposts on this and claiming it's not all that impressive or that the solution must have been in the training data or something, but at this point that's kind of dubiously close to arguing that Terence Tao doesn't know what he's talking about, which to say the least is a rather perilous position.
At this point, I think I'm making a belated New Years resolution to stop arguing with people who are still staying that LLMs are stochastic parrots that just remix their training data and can never come up with anything novel. I think that discussion is now dead. There are lots of fascinating issues to work out with how we can best apply LLMs to interesting problems (or get them to write good code), but to even start solving those issues you have to at least accept that they are at least somewhat capable of doing novel things.
In 2023 I would have bet hard against us getting to this point ("there's no way chatbots can actually reason their way through novel math!"), but here we are are three years later. I wonder what comes next?
Davidzheng|1 month ago
zozbot234|1 month ago
xigoi|1 month ago
snowmobile|1 month ago
Though I suppose if the problem statement in Lean is human-generated and there are no ways to "cheat" in a Lean proof, the proof could be trusted without understanding it
spopejoy|1 month ago
> ... to me, the more interesting capability revealed by these events is the ability to rapidly write and rewrite new versions of a text as needed, even if one was not the original author of the argument.
> This is sharp contrast to existing practice where the effort required to produce even one readable manuscript is quite time-consuming, and subsequent revisions (in response to referee reports, for instance) are largely confined to local changes (e.g., modifying the proof of a single lemma), with large-scale reworking of the paper often avoided due both to the work required and the large possibility of introducing new errors. However, the combination of reasonably competent AI text generation and modification capabilities, paired with the ability of formal proof assistants to verify the informal arguments thus generated, allows for a much more dynamic and high-multiplicity conception of what a writeup of an argument is, with the ability for individual participants to rapidly create tailored expositions of the argument at whatever level of rigor and precision is desired.
Of course this implies that the math works which is the Aristotle part, and that's great ... but this rebuts the "but this isn't AI by itself, this is AI and a bunch of experts working hard, nothing to see here": right, well even "experts working hard" fail to iterate on the paper which significantly hinders research progress.
cultofmetatron|1 month ago
tonygrue|1 month ago
zkmon|1 month ago
benrutter|1 month ago
1. This result is very far from showing something like "human mathematicians are no longer needed to advance mathematics".
2. Even if it did show that, as long as we need humans trained in understanding maths, since "professional mathematicians" are mostly educators, they probably aren't going anywhere.
becquerel|1 month ago
aziis98|1 month ago
Edit: Found it here https://github.com/plby/lean-proofs/blob/main/src/v4.24.0/Er...
thomasahle|1 month ago
The METR institute predicts that the length of tasks AI agents can complete doubles every 7 months.
We should expect it to take until 2033 before AI solves Clay Institute-level problems with 50% reliability.
kelseyfrog|1 month ago
1. https://mppbench.com/
zozbot234|1 month ago
Davidzheng|1 month ago
mehdi1964|1 month ago
kittikitti|1 month ago
https://github.com/teorth/erdosproblems/wiki/AI-contribution...
It classifies the advancements based on the level of AI input. In particular, the entry in Table 1 related to the original post has both a green and yellow light, reflecting the skepticism from others.
dnw|1 month ago
unknown|1 month ago
[deleted]
dylanz|1 month ago
esafak|1 month ago
Davidzheng|1 month ago
emsign|1 month ago
demirbey05|1 month ago
shevy-java|1 month ago
remix2000|1 month ago
MORPHOICES|1 month ago
[deleted]
maximgeorge|1 month ago
[deleted]
bgwalter|1 month ago
[deleted]
dang|1 month ago
We asked you about this just recently, but it's still most of what you're posting. You're making the site worse by doing this, right at the point where it's most vulnerable these days.
Your comment here is a shallow dismissal of exactly the type the HN guidelines ask users to avoid here:
"Please don't post shallow dismissals, especially of other people's work. A good critical comment teaches us something." (https://news.ycombinator.com/newsguidelines.html)
Predictably, it led to by far the worst subthread on this article. That's not cool. I don't want to ban you because you're also occasionally posting good comments that don't fit these negative categories, but we need you to fix this and stop degrading the threads.
azan_|1 month ago
Arainach|1 month ago
zozbot234|1 month ago
That's the best part! They don't even need to, because ChatGPT will happily do its own private "literature search" and then not tell you about it - even Terence Tao has freely admitted as much in his previous comments on the topic. So we can at least afford to be a bit less curmudgeonly and cynical about that specific dynamic: we've literally seen it happen.
energy123|1 month ago
stevenhuang|1 month ago
perching_aix|1 month ago
muldvarp|1 month ago
leggothrow|1 month ago
Meanwhile I can’t get Claude code to fix its own shit to save my life.
sponnath|1 month ago
oytis|1 month ago
If you think about it, it's also what a lot of other intellectual activity looks like, at least in STEM.
embedding-shape|1 month ago
Maybe this should give you some hint to that you're trying to use it in a different way than others?
Davidzheng|1 month ago