(no title)
Rochus | 8 days ago
But Lean 4 is significantly more rigid, granular, and foundational than e.g. Event-B, and they handle concepts like undefined areas and contradictions very differently. While both are "formal methods," they were built by different communities for different purposes: Lean is a pure mathematician's tool, while Event-B is a systems engineer's tool. Event-B is much more flexible, allowing an engineer (or the LLM) to sketch the vague, undefined contours of a system and gradually tighten the logical constraints through refinement.
LLMs are inherently statistical interpolators. They operate beautifully in an Open World (where missing information is just "unknown" and can be guessed or left vague) and they use Non-Monotonic Reasoning (where new information can invalidate previous conclusions). Lean 4 operates strictly on the Closed World Assumption (CWA) and is brutally Monotonic. This is why using Lean to model things humans care about (business logic, user interfaces, physical environments, dynamic regulations) quickly hits a dead end. The physical world is full of exceptions, missing data, and contradictions. Lean 4 is essentially a return to the rigid, brittle approach of the 1980s expert systems. Event-B (or similar methods) provides the logical guardrails, but critically, it tolerates under-specification. It doesn't force the LLM to solve the Frame Problem or explicitly define the whole universe. It just checks the specific boundaries the human cares about.
YeGoblynQueenne|8 days ago
I think LLM reasoning is not so much non-monotonic as unsound, in the sense that conclusions do not necessarily follow from the premises. New information may change conclusions but how that happens is anyone's guess. There's some scholarship on that, e.g. there's a series of papers by Subarao Kamphampathi and his students that show how reasoning models' "thiking" tokens don't really correspond to sound reasoning chains, even if they seem to improve performance overall [1].
But it is difficult to tell what reasoning really means in LLMs. I believe the charitable interpretation of claims about LLM reasoning is that it is supposed to be informal. There is evidence both for and against it (e.g. much testing is in fact on formal reasoning problems, like math exam questions or Sokoban, but there's tests of informal reasoning also, e.g. on the bar exam). However, different interpretations are hard to square with the claims that "we don't understand reasoning"; not a direct quote, but I'm aware of many claims like that by people whose job it is to develop LLMs and that were made at the height of activity around reasoning models (which seems now to have been superseded by activity around "world models") [1].
If LLMs are really capable of informal reasoning (I'm not necessarily opposed to that idea) then we really don't understand what that reasoning is, but it seems we're a bit stuck because to really understand it, we have to, well, formalise it.
That said, non-monotonic reasoning is supposed to be closer to the way humans do informal reasoning in the real world, compared to classical logic, even though classical logic started entirely as an effort to formalise human reasoning; I mean, with Aristotle's Syllogisms (literally "rsasonings" in Greek).
________________
[1] Happy to get links if needed.
Rochus|8 days ago
whattheheckheck|8 days ago
The very fact that it memorized the Ceasar cipher rot13 pattern is due to it being a Linux command and it had examples of patterns of 13 shifted letters. If you asked it to figure out a different shift it struggled.
Now compound that across all intelligent reasoning problems in the entirety of human existence and you'll see how we will never have enough data to make agi with this architecture and training paradigm.
But we will have higher and higher fidelity maps of symbolic reasoning patterns as they suck up all the agent usage data for knowledge work tasks. Hopefully your tasks fall out of distribution of the median training data scope
skybrian|8 days ago
sinkasapa|8 days ago
mycall|8 days ago
Rochus|8 days ago
jojomodding|8 days ago