top | item 47101403

(no title)

upghost | 9 days ago

So I have been doing formal specification with TLA+ using AI assistance and it has been very helpful AFTER I REALIZED that quite often it was proving things that were either trivial or irrelevant to the problem at hand (and not the problem itself), but difficult to detect at a high level.

I realize formal verification with lean is a slightly different game but if anyone here has any insight, I tend to be extremely nervous about a confidently presented AI "proof" because I am sure that the proof is proving whatever it is proving, but it's still very hard for me to be confident that it is proving what I need it to prove.

Before the dog piling starts, I'm talking specifically about distributed systems scenarios where it is just not possible for a human to think through all the combinatorics of the liveness and safety properties without proof assistance.

I'm open to being wrong on this, but I think the skill of writing a proof and understanding the proof is different than being sure it actually proves for all the guarantees you have in mind.

I feel like closing this gap is make it or break it for using AI augmented proof assistance.

discuss

oggy|9 days ago

In my experience, finding the "correct" specification for a problem is usually very difficult for realistic systems. Generally it's unlikely that you'll be able to specify ALL the relevant properties formally. I think there's probably some facet of Kolmogorov complexity there; some properties probably cannot be significantly "compressed" in a way where the specification is significantly shorter and clearer than the solution.

But it's still usually possible to distill a few crucial properties that can be specified in an "obviously correct" manner. It takes A LOT of work (sometimes I'd be stuck for a couple of weeks trying to formalize a property). But in my experience the trade off can be worth it. One obvious benefit is that bugs can be pricey, depending on the system. But another benefit is that, even without formal verification, having a few clear properties can make it much easier to write a correct system, but crucially also make it easier to maintain the system as time goes by.

awesomeMilou|8 days ago

I'm curious since I'm not a mathematician: What do you mean by "stuck for a couple of weeks"? I am trying to practice more advanced math and have stumbled over lean and such but I can't imagine you just sit around for weeks to ponder over a problem, right? What do you do all this time?

daxfohl|9 days ago

Yeah, even for simple things, it's surprisingly hard to write a correct spec. Or more to the point, it's surprisingly easy to write an incorrect spec and think it's correct, even under scrutiny, and so it turns out that you've proved the wrong thing.

There was a post a few months ago demonstrating this for various "proved" implementations of leftpad: https://news.ycombinator.com/item?id=45492274

This isn't to say it's useless; sometimes it helps you think about the problem more concretely and document it using known standards. But I'm not super bullish on "proofs" being the thing that keeps AI in line. First, like I said, they're easy to specify incorrectly, and second, they become incredibly hard to prove beyond a certain level of complexity. But I'll be interested to watch the space evolve.

(Note I'm bullish on AI+Lean for math. It's just the "provably safe AI" or "provably correct PRs" that I'm more skeptical of).

fauigerzigerk|9 days ago

>But I'm not super bullish on "proofs" being the thing that keeps AI in line.

But do we have anything that works better than some form of formal specification?

We have to tell the AI what to do and we have to check whether it has done that. The only way to achieve that is for a person who knows the full context of the business problem and feels a social/legal/moral obligation not to cheat to write a formal spec.

nextos|9 days ago

As a heavy user of formal methods, I think refinement types, instead of theorem proving with Lean or Isabelle, is both easier and more amenable to automation that doesn't get into these pitfalls.

It's less powerful, but easier to break down and align with code. Dafny and F* are two good showcases. Less power makes it also faster to verify and iterate on.

deterministic|7 days ago

Completely agree. Refinement types is a much more practical tool for software developers focusing on writing real world correct code.

Using LEAN or Coq requires you to basically convert your code to LEAN/Coq before you can start proving anything. And importing some complicated Hoare logic library. While proving things correct in Dafny (for example) feels much more like programming.

johnbender|9 days ago

You have identified the crux of the problem, just like mathematics writing down the “right” theorem is often half or more of the difficulty.

In the case of digital systems it can be much worse because we often have to include many assumptions to accommodate the complexity of our models. To use an example from your context, usually one is required to assume some kind of fairness to get anything to go through with systems operating concurrently but many kinds of fairness are not realistic (eg strong fairness).

esafak|9 days ago

Could you write a blog post about your experience to make it more concrete?

youknownothing|9 days ago

I was having the same intuition, but you verbalised it better: the notion of having a definitive yes/no answer is very attractive, but describing what you need in such terms using natural language, which is inherently ambiguous... that feels like a fool's errand. That's why I keep thinking that LLM usage for serious things will break down once we get to the truly complicated things: it's non-deterministic nature will be an unbreakable barrier. I wish I'm wrong, though.