top | item 46561569

(no title)

maxwells-daemon | 1 month ago

I work at Harmonic, the company behind Aristotle.

To clear up a few misconceptions:

- Aristotle uses modern AI techniques heavily, including language modeling.

- Aristotle can be guided by an informal (English) proof. If the proof is correct, Aristotle has a good chance at translating it into Lean (which is a strong vote of confidence that your English proof is solid). I believe that's what happened here.

- Once a proof is formalized into Lean (assuming you have formalized the statement correctly), there is no doubt that the proof is correct. This is the core of our approach: you can do a lot of (AI-driven) search, and once you find the answer you are certain it's correct no matter how complex the solution is.

Happy to answer any questions!

discuss

order

aidenn0|1 month ago

How do you verify that the AI translation to Lean is a correct formalization of the problem? In other fields, generative AI is very good at making up plausible sounding lies, so I'm wondering how likely that is for this usage.

sibrahim|1 month ago

That's what's covered by the "assuming you have formalized the statement correctly" parenthetical.

Given a formal statement of what you want, Lean can validate that the steps in a (tedious) machine-readable purported proof are valid and imply the result from accepted axioms. This is not AI, but a tiny, well reviewed kernel that only accepts correct formal logic arguments.

So, if you have a formal statement that you've verified to represent what you are interested in by some other means, Lean can tell you whether the proof created by genAI is correct. Basically, there is a nigh infallible checker that won't accept incorrect hallucinations.

svat|1 month ago

It may help to look at this example concretely:

The natural-language statement of the problem is (from https://www.erdosproblems.com/728):

> Let C>0 and ϵ>0 be sufficiently small. Are there infinitely many integers a,b,n with a≥ϵn and b≥ϵn such that a!b!∣n!(a+b−n)! and a+b>n+Clogn?

The Lean-language statement of the problem (which can be done either by hand or by AI) is (from https://github.com/plby/lean-proofs/blob/f44d8c0e433ab285541...):

    ∀ᶠ ε : ℝ in [>] 0, ∀ C > (0 : ℝ), ∀ C' > C,
      ∃ a b n : ℕ,
        0 < n ∧
        ε * n < a ∧
        ε * n < b ∧
        a ! * b ! ∣ n ! * (a + b - n)! ∧
        a + b > n + C * log n ∧
        a + b < n + C' * log n
Yes on the one hand, one needs to know enough about Lean to be sure that this formulation matches what we intend, and isn't stating something trivial. But on the other hand, this is not as hard as finding an error on some obscure line of a long proof.

(There's also an older formulation at https://github.com/google-deepmind/formal-conjectures/blob/f... but the new one is more in the spirit of what was intended: see the discussion starting at https://www.erdosproblems.com/forum/thread/728#post-2196 which gives a clear picture, as of course does Tao's thread in the OP that summarizes this discussion.)

maxwells-daemon|1 month ago

For this reason, when we announce results on e.g. the IMO, we formalize the statements by hand and inspect the proofs carefully to ensure they capture the full spirit of the problem.

However, there are some good heuristics. If you expect a problem to be hard and the proof is very short, you've probably missed something!

maccam912|1 month ago

To answer the question a different way, I think you are asking how we know the proof actually matches the description the human provided? And I'd say we can't know for sure, but the idea is that you can pretty concisely write and check yourself that the problem is accurate, i.e. "There are an infinite number of primes" or whatever, and then even if an LLM goes off and makes up a lean proof wildly different from your description, if lean says the proof is valid then you have proven the original statement. I guess in theory the actual proof could be way different than what you thought it would be, but ultimately all the logic will still check out.

rtpg|1 month ago

I feel like even outside of AI translation, formalization not capturing the spirit of what the informal description was provided is always a risk.

This is also a big risk when trying to prove code correctness: "prove this algo works" means you gotta define "works" along certain axes, and if you're very unlucky you might have a proof that exploits the uncertainty around a certain axis.

jlouis|1 month ago

The statement is something you provide. It's the search you can have the LLM do. If this works for math it will immediately make code way higher quality via the same tools.

JacobAsmuth|1 month ago

You read it yourself :)

If you and the AI agree on the translation of the problem, and lean agrees with the solution, then you're done.

roenxi|1 month ago

You're looking for the practical answer, but philosophically it isn't possible to translate an informal statement into a formal one 'correctly'. It is informal, ie, vaguely specified. The only certain questions are if the formal axioms and results are interesting which is independent of the informal formalisation and that can only be established by inspecting the the proof independently of the informal spec.

xiphias2|1 month ago

First congrats!

Sometimes when I'm using new LLMs I'm not sure if it’s a step forward or just benchmark hacking, but formalized math results always show that the progress is real and huge.

When do you think Harmonic will reach formalizing most (even hard) human written math?

I saw an interview with Christian Szegedy (your competitor I guess) that he believes it will be this year.

maxwells-daemon|1 month ago

Thank you! It depends on the topic. Some fields (algebra, number theory) are covered well by Lean's math library, and so I think we are already there; I recommend trying Aristotle for yourself to see how reliably it can formalize these theorems!

In other fields (topology, probability, linear algebra), many key definitions are not in Mathlib yet, so you will struggle to write down the theorem itself. (But in some cases, Aristotle can define the structure you are talking about on the fly!)

This is not an intrinsic limitations of Lean, it's just that nobody has taken the time to formalize much of those fields yet. We hope to dramatically accelerate this process by making it trivial to prove lemmas, which make up much of the work. For now, I still think humans should write the key definitions and statements of "central theorems" in a field, to ensure they are compatible with the rest of the library.

pvillano|1 month ago

Is anyone working on applying these techniques to formal verification of software?

My limited understanding of Rust is that it applies a fixed set of rules to guarantee memory safety. The rules are somewhat simple and limiting, for ease of understanding and implementation, but also because of undecidability.

Programmers run into situations where they know that their code won't cause memory errors, but it doesn't follow the rules. Wouldn't it be cool if something like Aristotle was integrated into the compiler? Any code for which a proof of correctness could be written would pass/compile, without having to add more and more rules

maxwells-daemon|1 month ago

We are! We very recently announced some results on formally proving the correctness of programs: https://harmonic.fun/news#blog-post-verina-bench-sota

Formal methods are cool because, by contrast to tools like the borrow checker, you can prove some very "nonlocal" properties: this system does not deadlock, or it makes progress at least every N steps, etc.

teiferer|1 month ago

An issue with this approach is that it may not be robust. That is, you could run into a casr where a minor modification of your program is suddenly not provable anymore, even though it is still correct. The heuristic (AI or otherwise) has necessarily limits, and if your are close to the "edge" of its capabilities then a minor change could push it across.

If the proof is rooted in the programmer's understanding who can give proof hints to the prover then any modification of the program can then be accompanied with a modification of the hints, still allowing automatic proofs. But if the human has no clue then the automatic system can get stuck without the human having a chance to help it along.

zozbot234|1 month ago

Formal verification of program correctness is also (for obvious reasons) key to unlocking AI-driven synthesis (i.e. 'vibe' coding) of "correct" programs that will verifiably meet the given spec.

wslh|1 month ago

This is the forst time I heard about Aristotle and find it very interesting. First question first: is it available for the general public? I don't know if this is the page to try it? [1]

Second, when you say language modeling support, it means that can better understand code representation (ASTs) or something else? I am just an AI user, not very knowledgeable in the field. My main interest is if it would be great for static analysis oriented to computer security (SAST).

[1] https://aristotle.ai/

qnleigh|1 month ago

> If the proof is correct, Aristotle has a good chance at translating it into Lean

How does this depend on the area of mathematics of the proof? I was under the impression that it was still difficult to formalize most research areas, even for a human. How close is Aristotle to this frontier?

nateberkopec|1 month ago

>assuming you have formalized the statement correctly

That's a pretty big assumption, though, isn't it? As we saw the Navier-Stokes psychosis episode over the New Year holiday, formalizing correctly really isn't guaranteed.

bjt12345|1 month ago

What occurs when this process is reversed - translate from lean to informal english, and does iterating this then help research better approaches toward writing proofs in human language?

hannasm|1 month ago

I had the same thought but unfortunately even if that translation is accurate it could still be bidirectional hallucinating and would not really be sufficient evidence...

It's another reformulation rather than a true proof. Now, instead of wanting a proof of a theorem, now we just need to prove that this proof is actually proving the theorem. The proof itself being so incomprehensible that it can't on its own be trusted, but if it can be shown that it can be trusted then the theorem must be true.

uoaei|1 month ago

You seem to be openly contradicting your company's PR and language. Your description very clearly describes the "AI" as a tool to translate relatively informal specifications into formal proof logic, but does not itself do the proving.

jjmarr|1 month ago

What are the benefits of Aristotle over a general-purpose coding assistant like Claude Code?

maxwells-daemon|1 month ago

Aristotle's output is formally verified in Lean, so you can run it for days on a hard problem and be assured that the answer, no matter how complex, is right without needing to manually check it.

Claude Code can write lean, but we do a heck of a lot of RL on theorem proving, so Aristotle winds up being much better at writing Lean than other coding agents are.

pfdietz|1 month ago

Do you have plans to apply this broadly to the historical math literature?

maxwells-daemon|1 month ago

Yes! I think that working with Mathlib is the best long term solution, because it's how people already collaborate on building out the formal "universe of mathematics." We want to speed that up, and hopefully we'll cover all of the common topics very soon!

EE84M3i|1 month ago

> there is no doubt that the proof is correct.

Do you have any links to reading about how often lean core has soundness bugs or mathlib has correctness bugs?

nottorp|1 month ago

So what did the "AI" actually do?

Translate an informal description of the proof into this Lean?

dpe82|1 month ago

TFA says ChatGPT wrote the informal description.

101008|1 month ago

Any chance Harmonic accept full remote employees? :)