top | item 46217051

(no title)

mutkach | 2 months ago

I certainly hope so.

I wonder, what is the actual blocker right now? I'd assume that LLMs are still not very good with specifications and verifcation languages? Anyone tried Datalog, TLA+, etc. with LLMs? I suppose that Gemini was specifically trained on Lean. Or at least some IMO-finetuned fork of it. Anyhow, there's probably a large Lean dataset collected somewhere in Deepmind servers, but that's not certification applicable necessarily, I think?

> AI also creates a need to formally verify more software: rather than having humans review AI-generated code, I’d much rather have the AI prove to me that the code it has generated is correct.

At RL stage LLMs could game the training*, proving easier invariants then actually expected (the proof is correct and possibly short - means positive reward). It would take additional care it to set it up right.

* I mean, if you set it to generate a code AND a proof to it.

discuss

No comments yet.