top | item 41850067

(no title)

nrfulton | 1 year ago

> Formally proving with a system based on probabilities (temperature, etc.) is for me an oxymoron. But I am not a mathematician.

In the context of automated theorem proving, proof search and proof checking could not be more different.

Proof search is undecidable for most logics. For that reason, heuristics are commonly used to guide proof search. This has always been the case; heuristic proof search predates the use of LLMs for theorem proving by decades.

Proof checking, on the other hand, is not just decidable but usually quite tractable and cheap. Therefore, proof checkers are typically deterministic and fast. Ideally the checker is also implemented in small amount of relatively simple and heavily audited code.

For a high-level description of how provers are typically architected, see https://www.pls-lab.org/en/de_Bruijn_criterion (or https://lawrencecpaulson.github.io/2022/01/05/LCF.html which is linked to at the bottom of the above page and goes into a bit more historical detail and nuance).

> I am not sure if this is better than generating cooking recipes with a LLM and asking a cook if they make sense or not.

Conceptually it's not at all similar to asking a cook if a recipe is "correct". For lean proofs, "correct" has an entirely precise meaning. The "cook" (i.e., proof checker) is basically free (much less than $0.01/hr). The cook (proof checker) takes at most a few seconds to read and respond to any particular recipe. And the cook (i.e., checker) never makes mistakes.

The relationship between asking a cook if a recipe is good and asking Lean to check a proof is approximately the difference between asking a mediocre human computer to compute a derivative and asking a modern mechanical computer to do the same (using a classical algorithm for doing so, not an LLM; no one is using LLMs to check proofs -- there's really no point to doing so, because the checking problem is easy).

discuss

order

Loic|1 year ago

Thank you for this very detailed answer.