top | item 47099509

(no title)

Rochus | 8 days ago

Interesting. It's essentially the same idea as in this article: https://substack.com/home/post/p-184486153. In both scenarios, the human is relieved of the burden of writing complex formal syntax (whether Event-B or Lean 4). The human specifies intent and constraints in natural language, while the LLM handles the work of formalization and satisfying the proof engine.

But Lean 4 is significantly more rigid, granular, and foundational than e.g. Event-B, and they handle concepts like undefined areas and contradictions very differently. While both are "formal methods," they were built by different communities for different purposes: Lean is a pure mathematician's tool, while Event-B is a systems engineer's tool. Event-B is much more flexible, allowing an engineer (or the LLM) to sketch the vague, undefined contours of a system and gradually tighten the logical constraints through refinement.

LLMs are inherently statistical interpolators. They operate beautifully in an Open World (where missing information is just "unknown" and can be guessed or left vague) and they use Non-Monotonic Reasoning (where new information can invalidate previous conclusions). Lean 4 operates strictly on the Closed World Assumption (CWA) and is brutally Monotonic. This is why using Lean to model things humans care about (business logic, user interfaces, physical environments, dynamic regulations) quickly hits a dead end. The physical world is full of exceptions, missing data, and contradictions. Lean 4 is essentially a return to the rigid, brittle approach of the 1980s expert systems. Event-B (or similar methods) provides the logical guardrails, but critically, it tolerates under-specification. It doesn't force the LLM to solve the Frame Problem or explicitly define the whole universe. It just checks the specific boundaries the human cares about.

discuss

order

YeGoblynQueenne|8 days ago

>> LLMs are inherently statistical interpolators. They operate beautifully in an Open World (where missing information is just "unknown" and can be guessed or left vague) and they use Non-Monotonic Reasoning (where new information can invalidate previous conclusions).

I think LLM reasoning is not so much non-monotonic as unsound, in the sense that conclusions do not necessarily follow from the premises. New information may change conclusions but how that happens is anyone's guess. There's some scholarship on that, e.g. there's a series of papers by Subarao Kamphampathi and his students that show how reasoning models' "thiking" tokens don't really correspond to sound reasoning chains, even if they seem to improve performance overall [1].

But it is difficult to tell what reasoning really means in LLMs. I believe the charitable interpretation of claims about LLM reasoning is that it is supposed to be informal. There is evidence both for and against it (e.g. much testing is in fact on formal reasoning problems, like math exam questions or Sokoban, but there's tests of informal reasoning also, e.g. on the bar exam). However, different interpretations are hard to square with the claims that "we don't understand reasoning"; not a direct quote, but I'm aware of many claims like that by people whose job it is to develop LLMs and that were made at the height of activity around reasoning models (which seems now to have been superseded by activity around "world models") [1].

If LLMs are really capable of informal reasoning (I'm not necessarily opposed to that idea) then we really don't understand what that reasoning is, but it seems we're a bit stuck because to really understand it, we have to, well, formalise it.

That said, non-monotonic reasoning is supposed to be closer to the way humans do informal reasoning in the real world, compared to classical logic, even though classical logic started entirely as an effort to formalise human reasoning; I mean, with Aristotle's Syllogisms (literally "rsasonings" in Greek).

________________

[1] Happy to get links if needed.

Rochus|8 days ago

My claim was not that an LLM was a formal, mathematically sound non-monotonic logic engine, but that the problem space is "non-monotonic" and "open world". The fact that an LLM is "unsound" and "informal" is the exact reason why my approach is necessary. Because LLMs are unsound, informal, and probabilistic, as you say, forcing them to interface with Lean 4 is a disaster. Lean 4 demands 100% mathematical soundness, totality, and closed-world perfection at every step. An LLM will just hit a brick wall. Methods like Event-B (which I suggest in my article), however, are designed to tolerate under-specification. It allows the LLM to provide an "unsound" or incomplete sketch, and uses the Proof Obligations to guide the LLM into a sound state via refinement.

whattheheckheck|8 days ago

Reasoning is a pattern that is embedded within the token patterns but the llms are imitating reasoning via learning symbolic reasoning patterns.

The very fact that it memorized the Ceasar cipher rot13 pattern is due to it being a Linux command and it had examples of patterns of 13 shifted letters. If you asked it to figure out a different shift it struggled.

Now compound that across all intelligent reasoning problems in the entirety of human existence and you'll see how we will never have enough data to make agi with this architecture and training paradigm.

But we will have higher and higher fidelity maps of symbolic reasoning patterns as they suck up all the agent usage data for knowledge work tasks. Hopefully your tasks fall out of distribution of the median training data scope

skybrian|8 days ago

I think it’s better to think of an LLM as a very good hint engine. It’s good at coming up with more possibilities to consider and less good at making sure they work, unless it has an external system to test ideas on and is trained to use it. In the case of applied math, it’s not enough to prove theorems. It also needs to be testing against the real world somehow.

sinkasapa|8 days ago

Lean 4 is uses constructive logic. If a closed world assumption requires that a statement that is true is also known to be true, and that any statement that is not known to be true is therefore false, that is not true of constructive systems. I only use Rocq, but I believe the type theories in Rocq and Lean 4 are basically similar variations on the Calculus of Constructions in both cases, though there are important differences. In a constructive theory something is true if a proof can be constructed, but the lack of a proof does not entail that something is false. One needs to prove that something is false. In constructive type theory, one can say, that something is true or false.

mycall|8 days ago

So basically you are arguing a Type Theory vs Set Theory problem, Foundationalism or Engineering Refinement. Since we read here of multiple use cases for LLMs in both CS divides, we can conclude an eventual convergence in these given approaches; and if not that, some formal principles should emerge of when to use what.

Rochus|8 days ago

This discussion started already in the sixties (see e.g. the 1969 publication by McCarthy and Hayes where they describe the "frame problem" as a fundamental obstacle to the attempt to model the dynamic world using First-Order Logic and monotonic reasoning). A popular attempt to "solve" this problem is the Cyc project. Monotonic logic is universally understood as a special, restricted case (a subset) of a broader non-monotonic theory.

jojomodding|8 days ago

Both type and set theory are formal logic, I don't see how that's what being argued. Rather that there are some things that are formal-logicy (e.g. set theory) and many other things that are not (like e.g. biology, you'll always find some weird organism breaking your assumptions).