> What's really amazing to me is that Stephen Kleene probably proved this without the help of a computer, but I was able to verify it with the highest possible scrutiny. It's as if he wrote an impressive program without ever having run it, and it turned out not to have any bugs!
This offhand comment (which we can all forgive) makes it seem like mathematics happens in a vacuum. I can understand the temptation to think that way, and I hope I can make it clear that this is not the case.
General theorems typically grow out of a set of representative examples of interesting behavior (unit tests, if you will), are judged by their relationship to other mathematical objects and results (integration tests), and are reviewed by other mathematicians (user acceptance tests).
Mathematicians don't just write down a proof and cross their fingers. The proof is often one of the last steps. There is a high-level intuition that explains why a theorem or proof sketch is probably true/correct before a formal proof is found. These explanations are usually compact enough to help one re-derive proof ("X doesn't have enough 'space' to fill Y, and Z tracks that gap", or something like that), but aren't formal enough to put into any formal logic system I know of.
(OP here.) Fair enough—thanks for the thoughtful clarification.
By the way, I'm a big fan of your "Math ∩ Programming" articles. People who found my blog interesting will certainly think the same of yours: https://jeremykun.com/
One very interesting experience for me was recently relearning set theory from a textbook. It talked a bit about the history of the Schröder–Bernstein theorem, a fundamental (and relatively "simple" sounding) result in Set theory.
What I found interesting was that this originally stated, but not proved, by Cantor. Then a few semi-flawed attempts at proofs were made (flawed in the sense that they relied on other axioms like choice, or that they had errors).
These are leading mathematicians, including Cantor, who basically invented Set Theory from scratch. And they went through many years of struggle to arrive at a proof of a result. Which you now learn in an introductory class.
It really put into perspective for me just how hard some of these things are, and made me feel less bad when I struggle with some mathematics.
> Mathematicians don't just write down a proof and cross their fingers.
On the flip side, this is exactly what a lot of lawyers do when writing contracts. Not necessarily maliciously or negligently, but it can still make for fun questions of interpretation when things don't go as expected.
Right, but convincing explanations can also be invented for things that aren't true. The true power is in the formalism, which allows you to use your untamable and untrustworthy intution to discover facts.
"To give you an idea of what an actual proof looks like in Coq, below is a proof of one of the easier lemmas above. The proof is virtually impossible to read without stepping through it interactively, so don’t worry if it doesn’t make any sense to you here."
is why Coq is not one of my favorite tools.
A proof consists of two things: the "proof state"---a statement of what you know at a given point--and a sequence of "proof operations"---the manipaulations that move from one proof state to the next.
Most mathematical proofs you have ever seen have been presented as a list of proof states, with the operations either explained in comments or taken to be obvious.
Coq (and most other similar systems) present the proof as a sequence of proof operations, with the state invisible unless you're going through it in a Coq IDE.
On one hand, that's the same thing as a program: a Coq proof is a program for creating the proof. Unfortunately, on the other, I've never been able to read Coq proofs the way I can read programs. Probably a lack of experience, I guess.
On the other hand, Coq is a damn fun computer game.
> Unfortunately, on the other, I've never been able to read Coq proofs the way I can read programs. Probably a lack of experience, I guess.
I don't think it's a lack of experience on your part. I worked with some people at MIT who were far better at Coq than I was, and they admitted the same thing. Coq proofs are unreadable, even when all the hypotheses and variables are given thoughtful names.
But, as you found, Coq proofs are not meant to be read; they are meant to be stepped through. I'm not sure if there is really a better alternative: if the formal proofs were made easier to read, they'd probably be much longer. And they can already get quite long compared to traditional paper proofs.
I think the right strategy is to break large developments into small lemmas, such that each proof is short and borderline obvious (similar to structuring a computer program to be self-documenting). Also, more experienced Coq users heavily invest in proof automation (building "tactics" in Coq) to make proofs short and sweet. I don't have much experience on that front, though.
Are there languages that model proofs as pure functions? I.e. instead of modifying some implicit state, you pass existing proofs as arguments to other proofs that define these existing proofs as assumptions to itself.
So, a proof would be a pure function, taking a number of assumptions as arguments, and in order to write this proof you must first pass proofs of the assumptions to it as arguments (and then you can reference these proven assumptions in the body of the proof when needed).
I had to take a semester on formal proofs using Coq, the same tool the article talks about.
Putting aside their steep learning curve, formal proof methods do not guarantee that the code you've written is bug free. They only guarantee that the code follows the requirements you defined given the conditions you also set on your inputs. You can think of it as a mathematical proof of your postconditions will hold given that your preconditions are true.
And you know what? People do make mistakes establishing those preconditions and postconditions all the time, and the tool is not going to be of any help. You've proven your code, but it may not do what you actually wanted it to do.
During a lecture I saw a tenured professor, who was a well-respected mathematician and expert in Coq, make some rather obvious mistakes while establishing those preconditions and postconditions. His proof was valid but worthless.
When you add that writing those proofs is actually rather time consuming, you end up with a niche product with little real-world applicability, except possibly for safety-critical environments.
> Putting aside their steep learning curve, formal proof methods do not guarantee that the code you've written is bug free.
People seem to always bring this up but what's better? Verified code is about as close as you're ever going to get to bug free.
If you're doing a large proof, getting the specification wrong and not eventually noticing while working on the proof isn't common. You'll likely have something that isn't provable and get proof goals that don't make any sense. I don't think it's a common problem for mathematicians either that they end up proving the wrong thing.
I haven't heard of any examples of complex verified programs where later someone found a huge flaw in the specification. It would be like writing a set of unit tests and a complex program that passed them which had behaviour that goes against what you wanted.
Nothing guarantees that code is bug free. We will always have to interact with hardware, and although we can have high confidence in the hardware the physical world has a habit of changing out from under us.
Your argument is very strange. We use type systems not because we think we'll write perfect code, but because we know it will reduce the likelihood of making mistakes in our code (at least, when that type system is sound).
It seems like you're against writing proofs that your code is correct because you think you'll do better. I have a very hard time believing that, seeing as you probably aren't writing your own assembly (that beats a compiler) most if any of the time.
”formal proof methods do not guarantee that the code you've written is bug free. They only guarantee that the code follows the requirements you defined given the conditions you also set on your inputs.”
I think you are mixing “bug” with “out of spec”. If you ask me to write a perfect chess program, you can’t say it’s a bug if it can’t play go.
Bad or incomplete requirements are a problem, but that’s a different problem from “bugs”.
Verification != validation, and there is no technique on the "make better code" side that will help with the validation side, not proofs, not unit tests, nada.
On the other hand, there is a certain amount of benefit of "double entry bookkeeping"; having a statement of what it's supposed to do separate from what it actually does. And you get bonus points if they're linked so that you know what it does is what it says it does.
It may be that there is 'little real-world" applicability' for the full deal as things stand now. But things like fancy type systems are trying to continually push usability towards proofiness.
I had to take such a formal proofs course too, just with a different tool that had even steeper learning curve and rough edges (prof used us as alpha tester of his new tool), and came to the same conclusion as you. It seems very valuable to formal proof specific small sub-sets or rather critical components like device drivers, like e.g. Microsoft is doing it for some time. I am sure it will become more popular, when the tools and languages get more mature and easier to grasp for average joe developers in the next years.
> And you know what? People do make mistakes establishing those preconditions and postconditions all the time, and the tool is not going to be of any help. You've proven your code, but it may not do what you actually wanted it to do.
This always comes up when formal methods are discussed, and I've never been able to understand this. The goal isn't really to create a 100% correct program, but to significantly reduce the chance of bugs.
And while writing pre- and post-condition is one way, another is to write global correctness propositions for the entire program that are easy to inspect for correctness (e.g, a two-line formula saying "the database is always consistent", or "no data is lost even if a node crashes"). Usually, these properties are verified not against the program itself, but against some high-level spec serving as a blueprint, and while there is no formal proof that the code follows the blueprint, the same is true for buildings, and blueprints can help a lot. Use those blue-prints alongside code-level pre-/post-conditions, and you have yourself a highly reliable piece of software.
The vision is to be able to verify those global correctness properties all the way down to the code (and even to machine code). This is called end-to-end verification, and has only been accomplished for programs of (very) modest size and complexity, and even then with great effort. But even without end-to-end verification, you can greatly increase your confidence in your program, at a cost that is not only affordable, but may even save you some money.
If you're a programmer that's not tried theorem proving before, I highly recommend giving it a go. It gave me the same 'thrill' I got when I first started programming as a 14-year old and realised a new way of thinking. I find it sharpens your mind in a new, and different way - that's really an exciting feeling.
I prefer Isabelle myself, as I find its proofs more readable in a traditional mathematical sense. Coq however, is probably easier for most programmers to experiment with.
(first precondition AND second precondition) IMPLIES implication
even though logically they are the same. The first version carries more mental overhead because it seems to indicate that there is some sort of hierarchy/assymetry to the two preconditions.
It's the same thing when I try to read a Haskell function type signature:
f :: a -> b -> c
f is a function taking a and returning (a function taking b and returning (c))
vs.
f is a function taking two parameters a and b and returning c
There is no hierarchy anyhow because the parameters can be swapped:
f' = \b a -> f a b
Curious to know if it's just me or there are others who experience the same.
I am not a particularly experienced functional programmer. What functional programming I do is mainly relegated to code I write as a hobby, but in the last year or so I've started using more functional techniques at work (and luckily my teammates haven't lynched me yet). These days at work I use primarily Ruby and I've been surprised at how often I write curried procs. The language neither requires nor encourages that behaviour, so why am I doing it?
I think I started with the same assumption that you have: the order or parameters is arbitrary. In functional programming, though, this is not the case. For example, the "map" function takes a function and a functor (you can think of a functor as a collection like a list if you aren't familiar with the term). The order is important because if I partially apply a parameter to "map" I get a completely different meaning.
If you partially apply the function to "map", then the result is a function that applies that function to any kind of functor. If you partially apply the functor to "map", then the result is a function that applies any kind of function to that specific functor. Because I can only partially apply the first parameter in the function, this helps define the meaning of the function. In this example, the first definition is more correct (because "map" always takes the function first and the functor second). This becomes increasingly important as you build applicatives and monads.
In case that was unclear, here's a quick example. Let's say I have a list of integers and I have a function that adds 5 to an integer. If I do something like "map addFive list", it will go through each element in the list and add 5, resulting in a new list. If I partially apply the function to map with "map addFive", this will return a function that loops through any list and adds five to each element. So I might say "let addFiveToEach = map addFive".
If we had a different kind of map function (let's call it "mapNew") maybe we can do "mapNew list addFive" and it will do the same as the "map" code above. But if I partially apply list to mapNew I get a function that applies a function to my specific list. I might do "let eachInList = mapNew list".
The result is two very different functions. Either I will do "addFiveToEach list" or I will do "eachInList addFive". What meaning I prefer depends on the situation.
It's this idea that partially applied functions have a meaning that is powerful. When you are writing the code, which meaning do you want? As you note, both are possible, but I think you'll find that only one is actually appropriate in most cases. For example, instead of calling the function with both parameters, what if you assigned the result of partially applying the function to a variable? What would the variable be called? Does that help the reader understand the code?
Like I said, I was surprised at how often it makes a difference in my code. In fact, where it doesn't make a difference, it's usually an indication that I've made a mistake somewhere. It is even to the point where you start to want to pass partially applied functions around because they are more meaningful than the data that they contain. In fact a function is literally just another container in a functional programming language. For me, that was a kind of mind blowing realisation.
So I'm wondering what is the bridge between "proof-assistant" and "automated (or partially automated) theorem prover".
As someone with "journalistic" (but fairly in-depth) knowledge about these topics, my guess is that something like a proof-assistant will have to be paired up with some kind of logic programming system (like Prolog, or a kanren-derived system or something).
I guess then the problem of combinatorial explosion in logic search translates directly to the combinatorial explosion of proof-steps in a proof assistant. Hence my mention of partially-automated (more doable than fully-automated).
Probably some kind of a "database of facts" would also be involved, making it an expert system (reminds me of metamath [1]).
Is it fair to say, partially-automated theorem proving is a fairly advanced version (and not without added complexity) of a proof-assistant, and will help ease some of the grunt work involved in using a proof-assistant by hand?
I also enjoy creating proofs as a hobby. When I'm doing it for fun, I tend to use Metamath, in particular, the "Metamath Proof Explorer" (MPE, aka set.mm) which formalizes the most common mathematical foundation (classical logic + ZFC). You can see that here: http://us.metamath.org/mpeuni/mmset.html
The thing I like about Metamath is that every step is directly visible. Coq, in contrast, shows "how to write a program to find a proof" - and not the proof itself. So you can see absolutely every step, and follow the details using simple hyperlinks.
Of course, all approaches have pros and cons. Metamath has some automation; the metamath and mmj2 programs can automatically determine some things, and there's been some experiments with machine learning that are promising. However, Coq has more automation, in part because it operates at a different level.
I've done a bit of work on a "next generation" Metamath, Ghilbert[1], but have come to the conclusion that it's too ambitious to do as a part-time project. People might find it interesting, though, because it attempts to take interactivity to a higher level than Metamath, and is (imho) a cleaner language.
These notions are actually becoming more and more intermixed. You could say today that interactive theorem proving (what is done in a proof assistant) is an extension of automated theorem proving, as for example the Isabelle proof assistant has very strong support for proving intermediate theorems automatically.
Cool post! The stuff about Curry-Howard was really interesting and relates to a question i have been thinking about. But I'm a novice when it comes to the theory of type systems or theorem provers. Maybe someone here can enlighten me?
As type systems become more complex (like say in Scala or Haskell), they move towards having specialized syntax and weird hacks to express what are arguably simpler logical theorems/invariants about values in a type.. so why can't we have a programming language that gives you a full blown logic system(Coq? or Microsoft Z3? a SAT Solver?.. i don't understand the differences in detail) rather than part of one? Are there attempts at that, or does it turn out to be unworkable?
> so why can't we have a programming language that gives you a full blown logic system
Most dependently typed programming languages, including Coq, give you the ability to basically "do logic" in a mostly unrestricted way.
There are a few challenges that come to mind with making a practical programming language that can also do arbitrary math:
a) For the language to be useful for proving things, it needs to be strongly normalizing. That means all programs halt—otherwise, you could trivially prove any proposition. Unfortunately that limits its utility as a general purpose programming language.
b) The languages which attempt to be useful for theorem proving are generally pure functional languages (due to the strong correspondence between functional programming and logic). Most programmers don't like working in pure functional languages, although the success of Haskell shows that there are certainly programmers who do (myself included).
c) The more expressive a type system is, the more difficult type inference becomes. Yet, paradoxically, I find myself writing more types in languages like C++ and Java than I do in equivalent Coq programs. I believe Coq uses a variant of a bidirectional type inference engine, which is quite usable in practice.
Idris is an attempt to make a practical programming language with the full power of dependent types. So it's probably the closest real language to what you're describing.
Z3 and other SAT/SMT solvers take a different approach: they try to search for proofs automatically, and for certain kinds of math they are very successful. But these are generally only for first-order theories or other restricted kinds of propositions, and not great for general theorem proving. They are good for things like proving array accesses are within bounds, or that you never try to pop from an empty stack (at least that's my understanding of their capabilities).
The basic issue lies with `decidability'. Generally we want type systems to tell us whether our program is well typed.
However as your type-system becomes `too expressive' it starts to become much harder to determine if a program is well typed.
When finally a type system becomes so expressive as to be Turing complete, you get 'undecidable types'. Those are the types that correspond to undecidable programs for Turing machines.
A consequence is that any `sound' typing system will have to reject some programs that actually are `well typed'. This is why many type systems have a "Just trust me, this is fine" option. Like casting in C/C++ and unsafe in rust.
To fully eliminate those "Just trust me" and still accept all well typed programs you'd need to accept potentially never ending type-checking (or else you'd have solved the halting problem).
We do have such languages. For example, Java has JML [1], C has the very similar ACSL, and a programming language like Idris has a type system that can express arbitrary logical propositions (there are various tradeoffs that may influence the choice between a rich type system a-la Idris or a rich contract system a-la JML, but many of the principles are the same).
The problem is that we do not yet know how to make effective use of such code-level specifications. Finding a cost/benefit sweet spot is not easy (I personally prefer the JML approach precisely because I think it allows for greater flexibility in the choice of tradeoffs vis-a-vis confidence vs. cost).
I wonder if simplified versions of these proofs couldn't be used with students. Right now it seems like too much details, but maybe there can be a common "stdlib" of standard math arguments that we assume to be true, and then focus on the specific machinery for each proof?
My question for the author: does implementing a proof formally in Coq honestly give you a better understanding of the theorem/proof?
It would seem to me that, if you're filling in details from an existing proof or making some existing formal (in the math sense) intuition more formal (in the Coq sense), there's not much deep insight to gain from implementing a proof in Coq.
I took the liberty of translating your module to TLA+, my formal tool of choice (recently, I've been learning Lean, which is similar to Coq, but I find it much harder than TLA+). I tried to stick to your naming and style (which deviate somewhat from TLA+'s idiomatic styling), and, as you can see, the result is extremely similar, but I guess that when I try writing the proofs, some changes to the definitions would need to be made.
One major difference is that proofs in TLA+ are completely declarative. You just list the target and the required axioms, lemmas and definitions that require expanding. Usually, the proof requires breaking the theorem apart to multiple intermediary steps, but in the case of the proof you listed, TLA+ is able to find the proof completely automatically:
LEMMA supremumUniqueness ≜
∀ P ∈ SUBSET T : ∀ x1, x2 ∈ P : supremum(P, x1) ∧ supremum(P, x2) ⇒ x1 = x2
PROOF BY antisym DEF supremum (* we tell the prover to use the antisym axiom and expand the definition of supremum *)
The natDiff lemma doesn't even need to be stated, as it's automatically deduced from the built-in axioms/theorems:
LEMMA natDiff ≜ ∀ n1, n2 ∈ Nat : ∃ n3 ∈ Nat : n1 = n2 + n3 ∨ n2 = n1 + n3
PROOF OBVIOUS (* this is automatically verified using just built-in axioms/theorems *)
Another difference is that TLA+ is untyped (which makes the notation more similar to ordinary math), but, as you can see, this doesn't make much of a difference. The only things that are different from ordinary math notation is that function application uses square brackets (parentheses are used for operator substitution; operators are different from functions, but that's a subtelty; you can think of operators as polymorphic functions or as macros), set comprehension uses a colon instead of a vertical bar, a colon is also used in lieu of parentheses after quantifiers, and aligned lists of connectives (conjunctions and disjunctions) are read as if there were parentheses surrounding each aligned clause. Also `SUBSET T` means the powerset of T.
Here's the module (without proofs, except for the one above):
------------------------------- MODULE Kleene -------------------------------
EXTENDS Naturals
(*
Assumption: Let (T, leq) be a partially ordered set, or poset. A poset is
a set with a binary relation which is reflexive, transitive, and
antisymmetric.
*)
CONSTANT T
CONSTANT _ ≼ _
AXIOM refl ≜ ∀ x ∈ T : x ≼ x
AXIOM trans ≜ ∀ x, y, z ∈ T : x ≼ y ∧ y ≼ z ⇒ x ≼ z
AXIOM antisym ≜ ∀ x, y ∈ T : x ≼ y ∧ y ≼ x ⇒ x = y
(*
A supremum of a subset of T is a least element of T which is greater than
or equal to every element in the subset. This is also called a join or least
upper bound.
*)
supremum(P, x1) ≜ ∧ x1 ∈ P
∧ ∀ x2 ∈ P : x2 ≼ x1
∧ ∀ x3 ∈ P : ∀ x2 ∈ P : x2 ≼ x3 ⇒ x1 ≼ x3
(*
A directed subset of T is a non-empty subset of T such that any two elements
in the subset have an upper bound in the subset.
*)
directed(P) ≜ ∧ P ≠ {}
∧ ∀ x1, x2 ∈ P : ∃ x3 ∈ P : x1 ≼ x3 ∧ x2 ≼ x3
(*
Assumption: Let the partial order be directed-complete. That means every
directed subset has a supremum.
*)
AXIOM directedComplete ≜ ∀ P ∈ SUBSET T : directed(P) ⇒ ∃ x : supremum(P, x)
(*
Assumption: Let T have a least element called bottom. This makes our partial
order a pointed directed-complete partial order.
*)
CONSTANT bottom
AXIOM bottomLeast ≜ bottom ∈ T ∧ ∀ x ∈ T : bottom ≼ x
(*
A monotone function is one which preserves order. We only need to consider
functions for which the domain and codomain are identical and have the same
order relation, but this need not be the case for monotone functions in
general.
*)
monotone(f) ≜ ∀ x1, x2 ∈ DOMAIN f : x1 ≼ x2 ⇒ f[x1] ≼ f[x2]
(*
A function is Scott-continuous if it preserves suprema of directed subsets.
We only need to consider functions for which the domain and codomain are
identical and have the same order relation, but this need not be the case for
continuous functions in general.
*)
Range(f) ≜ { f[x] : x ∈ DOMAIN f }
continuous(f) ≜
∀ P ∈ SUBSET T: ∀ x1 ∈ P :
directed(P) ∧ supremum(P, x1) ⇒ supremum(Range(f), f[x1])
(* This function performs iterated application of a function to bottom. *)
RECURSIVE approx(_, _)
approx(f, n) ≜ IF n = 0 THEN bottom ELSE f[approx(f, n-1)]
(* We will need this simple lemma about pairs of natural numbers. *)
LEMMA natDiff ≜ ∀ n1, n2 ∈ Nat : ∃ n3 ∈ Nat : n1 = n2 + n3 ∨ n2 = n1 + n3
(* The supremum of a subset of T, if it exists, is unique. *)
LEMMA supremumUniqueness ≜
∀ P ∈ SUBSET T : ∀ x1, x2 ∈ P : supremum(P, x1) ∧ supremum(P, x2) ⇒ x1 = x2
PROOF BY antisym DEF supremum
(* Scott-continuity implies monotonicity. *)
LEMMA continuousImpliesMonotone ≜ ∀ f : continuous(f) ⇒ monotone(f)
(*
Iterated applications of a monotone function f to bottom form an ω-chain,
which means they are a totally ordered subset of T. This ω-chain is called
the ascending Kleene chain of f.
*)
LEMMA omegaChain ≜
∀ f : ∀ n, m ∈ Nat :
monotone(f) ⇒
approx(f, n) ≼ approx(f, m) ∨ approx(f, m) ≼ approx(f, n)
(* The ascending Kleene chain of f is directed. *)
LEMMA kleeneChainDirected ≜
∀ f : monotone(f) ⇒ directed({ approx(f, n) : n ∈ Nat })
(*
The Kleene fixed-point theorem states that the least fixed-point of a Scott-
continuous function over a pointed directed-complete partial order exists and
is the supremum of the ascending Kleene chain.
*)
THEOREM kleene ≜
∀ f : continuous(f) ⇒
∃ x1 ∈ T :
∧ supremum({ approx(f, n) : n ∈ Nat }, x1)
∧ f[x1] = x1
∧ ∀ x2 : f[x2] = x2 ⇒ x1 ≼ x2
=============================================================================
Whoa, very cool post! It's interesting that "small but novel" languages can be formally proved. Do any mainstream languages have a sound foundation like that? If someone is interested in creating a new programming language, is it worth getting up to speed on proof writing and using that to build the language? It's not really a topic that comes up in the Dragon Book or mainstream general compiler study, AFAIK.
I've been working in a CPS Joy variant, deriving recursive functions on trees and such. I've noticed that the code always works the first time. So far my "data" on this phenomenon is still too anecdotal to get excited, but I wanted to mention it (here where people might be interested.) It surprised me at first, but then I realized that this is part of the promise of Backus' Functional Programming: being able to do algebra to derive programs gives you error-free code that doesn't require tests.
It leads me to speculate... what if I don't need types to write correct software? I assumed I would implement typing and type inference (like Factor language already has) at some point. But if the code has no bugs without it... YAGNI?
My reasoning goes like this: Let's postulate that we have a library of functions, each of which has the quality Q of being error-free. If we derive new functions from the existing ones by a process that preserves Q, so that new functions are all also error-free, then we cannot have errors in our code, just user error. I.e. the user can present bad inputs, but no other source of error exists (other than mechanical faults.) This would be a desirable state of affairs. Now if the Q-preserving new-function-deriving process doesn't require types to work, then we don't need them. Whether or not typing helps prevent user error is another question. Personally, I lean toward defining user error as UI error, but I am ranging far from my original point now.
I've never managed to have fun with Coq even though I really tried. I did tons of exercises and labs. On the other hand, I do love (functional) programming, logic and maths in general.
It's almost like you get the bad side of programming (time-consuming, tedious) without the reward (seeing your program doing cool things). And you don't get the satisfaction of doing maths either. At least I don't. Maybe I didn't try hard enough.
What is very interesting though is to learn about type theory and the Curry Howard isomorphism.
Is it possible for one of these "programs" to run forever? If so, does that illustrate a direct correspondence between the halting problem and godels incompleteness theorem?
A lot of these comments are focused on the merits of Coq and some of the mathematical details of your post. As someone who has no experience with Coq and little experience with mathematical proofs, I want to add an additional perspective. This is a pretty cool hobby. It's really nice to hear about hobbies that aren't motivated by ambitions of profit or fame. Thanks for sharing!
[+] [-] j2kun|8 years ago|reply
This offhand comment (which we can all forgive) makes it seem like mathematics happens in a vacuum. I can understand the temptation to think that way, and I hope I can make it clear that this is not the case.
General theorems typically grow out of a set of representative examples of interesting behavior (unit tests, if you will), are judged by their relationship to other mathematical objects and results (integration tests), and are reviewed by other mathematicians (user acceptance tests).
Mathematicians don't just write down a proof and cross their fingers. The proof is often one of the last steps. There is a high-level intuition that explains why a theorem or proof sketch is probably true/correct before a formal proof is found. These explanations are usually compact enough to help one re-derive proof ("X doesn't have enough 'space' to fill Y, and Z tracks that gap", or something like that), but aren't formal enough to put into any formal logic system I know of.
[+] [-] curryhoward|8 years ago|reply
By the way, I'm a big fan of your "Math ∩ Programming" articles. People who found my blog interesting will certainly think the same of yours: https://jeremykun.com/
[+] [-] edanm|8 years ago|reply
What I found interesting was that this originally stated, but not proved, by Cantor. Then a few semi-flawed attempts at proofs were made (flawed in the sense that they relied on other axioms like choice, or that they had errors).
These are leading mathematicians, including Cantor, who basically invented Set Theory from scratch. And they went through many years of struggle to arrive at a proof of a result. Which you now learn in an introductory class.
It really put into perspective for me just how hard some of these things are, and made me feel less bad when I struggle with some mathematics.
[+] [-] eqmvii|8 years ago|reply
On the flip side, this is exactly what a lot of lawyers do when writing contracts. Not necessarily maliciously or negligently, but it can still make for fun questions of interpretation when things don't go as expected.
[+] [-] whatshisface|8 years ago|reply
[+] [-] mcguire|8 years ago|reply
"To give you an idea of what an actual proof looks like in Coq, below is a proof of one of the easier lemmas above. The proof is virtually impossible to read without stepping through it interactively, so don’t worry if it doesn’t make any sense to you here."
is why Coq is not one of my favorite tools.
A proof consists of two things: the "proof state"---a statement of what you know at a given point--and a sequence of "proof operations"---the manipaulations that move from one proof state to the next.
Most mathematical proofs you have ever seen have been presented as a list of proof states, with the operations either explained in comments or taken to be obvious.
Coq (and most other similar systems) present the proof as a sequence of proof operations, with the state invisible unless you're going through it in a Coq IDE.
On one hand, that's the same thing as a program: a Coq proof is a program for creating the proof. Unfortunately, on the other, I've never been able to read Coq proofs the way I can read programs. Probably a lack of experience, I guess.
On the other hand, Coq is a damn fun computer game.
[+] [-] curryhoward|8 years ago|reply
I don't think it's a lack of experience on your part. I worked with some people at MIT who were far better at Coq than I was, and they admitted the same thing. Coq proofs are unreadable, even when all the hypotheses and variables are given thoughtful names.
But, as you found, Coq proofs are not meant to be read; they are meant to be stepped through. I'm not sure if there is really a better alternative: if the formal proofs were made easier to read, they'd probably be much longer. And they can already get quite long compared to traditional paper proofs.
I think the right strategy is to break large developments into small lemmas, such that each proof is short and borderline obvious (similar to structuring a computer program to be self-documenting). Also, more experienced Coq users heavily invest in proof automation (building "tactics" in Coq) to make proofs short and sweet. I don't have much experience on that front, though.
[+] [-] qznc|8 years ago|reply
[+] [-] runeks|8 years ago|reply
So, a proof would be a pure function, taking a number of assumptions as arguments, and in order to write this proof you must first pass proofs of the assumptions to it as arguments (and then you can reference these proven assumptions in the body of the proof when needed).
[+] [-] Wizek|8 years ago|reply
[+] [-] jmite|8 years ago|reply
[+] [-] david-gpu|8 years ago|reply
Putting aside their steep learning curve, formal proof methods do not guarantee that the code you've written is bug free. They only guarantee that the code follows the requirements you defined given the conditions you also set on your inputs. You can think of it as a mathematical proof of your postconditions will hold given that your preconditions are true.
And you know what? People do make mistakes establishing those preconditions and postconditions all the time, and the tool is not going to be of any help. You've proven your code, but it may not do what you actually wanted it to do.
During a lecture I saw a tenured professor, who was a well-respected mathematician and expert in Coq, make some rather obvious mistakes while establishing those preconditions and postconditions. His proof was valid but worthless.
When you add that writing those proofs is actually rather time consuming, you end up with a niche product with little real-world applicability, except possibly for safety-critical environments.
[+] [-] seanwilson|8 years ago|reply
People seem to always bring this up but what's better? Verified code is about as close as you're ever going to get to bug free.
If you're doing a large proof, getting the specification wrong and not eventually noticing while working on the proof isn't common. You'll likely have something that isn't provable and get proof goals that don't make any sense. I don't think it's a common problem for mathematicians either that they end up proving the wrong thing.
I haven't heard of any examples of complex verified programs where later someone found a huge flaw in the specification. It would be like writing a set of unit tests and a complex program that passed them which had behaviour that goes against what you wanted.
[+] [-] voxl|8 years ago|reply
Your argument is very strange. We use type systems not because we think we'll write perfect code, but because we know it will reduce the likelihood of making mistakes in our code (at least, when that type system is sound).
It seems like you're against writing proofs that your code is correct because you think you'll do better. I have a very hard time believing that, seeing as you probably aren't writing your own assembly (that beats a compiler) most if any of the time.
[+] [-] Someone|8 years ago|reply
I think you are mixing “bug” with “out of spec”. If you ask me to write a perfect chess program, you can’t say it’s a bug if it can’t play go.
Bad or incomplete requirements are a problem, but that’s a different problem from “bugs”.
[+] [-] mcguire|8 years ago|reply
Verification != validation, and there is no technique on the "make better code" side that will help with the validation side, not proofs, not unit tests, nada.
On the other hand, there is a certain amount of benefit of "double entry bookkeeping"; having a statement of what it's supposed to do separate from what it actually does. And you get bonus points if they're linked so that you know what it does is what it says it does.
It may be that there is 'little real-world" applicability' for the full deal as things stand now. But things like fancy type systems are trying to continually push usability towards proofiness.
[+] [-] frik|8 years ago|reply
[+] [-] pron|8 years ago|reply
This always comes up when formal methods are discussed, and I've never been able to understand this. The goal isn't really to create a 100% correct program, but to significantly reduce the chance of bugs.
And while writing pre- and post-condition is one way, another is to write global correctness propositions for the entire program that are easy to inspect for correctness (e.g, a two-line formula saying "the database is always consistent", or "no data is lost even if a node crashes"). Usually, these properties are verified not against the program itself, but against some high-level spec serving as a blueprint, and while there is no formal proof that the code follows the blueprint, the same is true for buildings, and blueprints can help a lot. Use those blue-prints alongside code-level pre-/post-conditions, and you have yourself a highly reliable piece of software.
The vision is to be able to verify those global correctness properties all the way down to the code (and even to machine code). This is called end-to-end verification, and has only been accomplished for programs of (very) modest size and complexity, and even then with great effort. But even without end-to-end verification, you can greatly increase your confidence in your program, at a cost that is not only affordable, but may even save you some money.
[+] [-] nlperguiy|8 years ago|reply
[+] [-] throwawayjava|8 years ago|reply
[+] [-] bby|8 years ago|reply
[deleted]
[+] [-] yaseer|8 years ago|reply
I prefer Isabelle myself, as I find its proofs more readable in a traditional mathematical sense. Coq however, is probably easier for most programmers to experiment with.
[+] [-] kelukelugames|8 years ago|reply
[+] [-] zyxzevn|8 years ago|reply
[+] [-] amelius|8 years ago|reply
> "From this proposition it will follow, when arithmetical addition has been defined, that 1 + 1 = 2." —Volume I, 1st edition, page 379.
[1] https://en.wikipedia.org/wiki/Principia_Mathematica
[+] [-] kawyut|8 years ago|reply
(first precondition IMPLIES (second precondition IMPLIES (implication))
is inherently harder for me to understand than
(first precondition AND second precondition) IMPLIES implication
even though logically they are the same. The first version carries more mental overhead because it seems to indicate that there is some sort of hierarchy/assymetry to the two preconditions.
It's the same thing when I try to read a Haskell function type signature:
f :: a -> b -> c
f is a function taking a and returning (a function taking b and returning (c))
vs.
f is a function taking two parameters a and b and returning c
There is no hierarchy anyhow because the parameters can be swapped:
f' = \b a -> f a b
Curious to know if it's just me or there are others who experience the same.
[+] [-] mikekchar|8 years ago|reply
I think I started with the same assumption that you have: the order or parameters is arbitrary. In functional programming, though, this is not the case. For example, the "map" function takes a function and a functor (you can think of a functor as a collection like a list if you aren't familiar with the term). The order is important because if I partially apply a parameter to "map" I get a completely different meaning.
If you partially apply the function to "map", then the result is a function that applies that function to any kind of functor. If you partially apply the functor to "map", then the result is a function that applies any kind of function to that specific functor. Because I can only partially apply the first parameter in the function, this helps define the meaning of the function. In this example, the first definition is more correct (because "map" always takes the function first and the functor second). This becomes increasingly important as you build applicatives and monads.
In case that was unclear, here's a quick example. Let's say I have a list of integers and I have a function that adds 5 to an integer. If I do something like "map addFive list", it will go through each element in the list and add 5, resulting in a new list. If I partially apply the function to map with "map addFive", this will return a function that loops through any list and adds five to each element. So I might say "let addFiveToEach = map addFive".
If we had a different kind of map function (let's call it "mapNew") maybe we can do "mapNew list addFive" and it will do the same as the "map" code above. But if I partially apply list to mapNew I get a function that applies a function to my specific list. I might do "let eachInList = mapNew list".
The result is two very different functions. Either I will do "addFiveToEach list" or I will do "eachInList addFive". What meaning I prefer depends on the situation.
It's this idea that partially applied functions have a meaning that is powerful. When you are writing the code, which meaning do you want? As you note, both are possible, but I think you'll find that only one is actually appropriate in most cases. For example, instead of calling the function with both parameters, what if you assigned the result of partially applying the function to a variable? What would the variable be called? Does that help the reader understand the code?
Like I said, I was surprised at how often it makes a difference in my code. In fact, where it doesn't make a difference, it's usually an indication that I've made a mistake somewhere. It is even to the point where you start to want to pass partially applied functions around because they are more meaningful than the data that they contain. In fact a function is literally just another container in a functional programming language. For me, that was a kind of mind blowing realisation.
[+] [-] fizixer|8 years ago|reply
As someone with "journalistic" (but fairly in-depth) knowledge about these topics, my guess is that something like a proof-assistant will have to be paired up with some kind of logic programming system (like Prolog, or a kanren-derived system or something).
I guess then the problem of combinatorial explosion in logic search translates directly to the combinatorial explosion of proof-steps in a proof assistant. Hence my mention of partially-automated (more doable than fully-automated).
Probably some kind of a "database of facts" would also be involved, making it an expert system (reminds me of metamath [1]).
Is it fair to say, partially-automated theorem proving is a fairly advanced version (and not without added complexity) of a proof-assistant, and will help ease some of the grunt work involved in using a proof-assistant by hand?
[1] http://us.metamath.org/
[+] [-] dwheeler|8 years ago|reply
Here's a short video intro about MPE that I made: https://www.youtube.com/watch?v=8WH4Rd4UKGE
The thing I like about Metamath is that every step is directly visible. Coq, in contrast, shows "how to write a program to find a proof" - and not the proof itself. So you can see absolutely every step, and follow the details using simple hyperlinks.
Of course, all approaches have pros and cons. Metamath has some automation; the metamath and mmj2 programs can automatically determine some things, and there's been some experiments with machine learning that are promising. However, Coq has more automation, in part because it operates at a different level.
[+] [-] raphlinus|8 years ago|reply
[1] http://ghilbert.org/
[+] [-] auggierose|8 years ago|reply
[+] [-] unknown|8 years ago|reply
[deleted]
[+] [-] zaptheimpaler|8 years ago|reply
As type systems become more complex (like say in Scala or Haskell), they move towards having specialized syntax and weird hacks to express what are arguably simpler logical theorems/invariants about values in a type.. so why can't we have a programming language that gives you a full blown logic system(Coq? or Microsoft Z3? a SAT Solver?.. i don't understand the differences in detail) rather than part of one? Are there attempts at that, or does it turn out to be unworkable?
[+] [-] curryhoward|8 years ago|reply
Most dependently typed programming languages, including Coq, give you the ability to basically "do logic" in a mostly unrestricted way.
There are a few challenges that come to mind with making a practical programming language that can also do arbitrary math:
a) For the language to be useful for proving things, it needs to be strongly normalizing. That means all programs halt—otherwise, you could trivially prove any proposition. Unfortunately that limits its utility as a general purpose programming language.
b) The languages which attempt to be useful for theorem proving are generally pure functional languages (due to the strong correspondence between functional programming and logic). Most programmers don't like working in pure functional languages, although the success of Haskell shows that there are certainly programmers who do (myself included).
c) The more expressive a type system is, the more difficult type inference becomes. Yet, paradoxically, I find myself writing more types in languages like C++ and Java than I do in equivalent Coq programs. I believe Coq uses a variant of a bidirectional type inference engine, which is quite usable in practice.
Idris is an attempt to make a practical programming language with the full power of dependent types. So it's probably the closest real language to what you're describing.
Z3 and other SAT/SMT solvers take a different approach: they try to search for proofs automatically, and for certain kinds of math they are very successful. But these are generally only for first-order theories or other restricted kinds of propositions, and not great for general theorem proving. They are good for things like proving array accesses are within bounds, or that you never try to pop from an empty stack (at least that's my understanding of their capabilities).
[+] [-] rocqua|8 years ago|reply
When finally a type system becomes so expressive as to be Turing complete, you get 'undecidable types'. Those are the types that correspond to undecidable programs for Turing machines.
A consequence is that any `sound' typing system will have to reject some programs that actually are `well typed'. This is why many type systems have a "Just trust me, this is fine" option. Like casting in C/C++ and unsafe in rust.
To fully eliminate those "Just trust me" and still accept all well typed programs you'd need to accept potentially never ending type-checking (or else you'd have solved the halting problem).
[+] [-] __s|8 years ago|reply
[0]: http://smallcultfollowing.com/babysteps/blog/2016/11/04/asso...
[+] [-] pron|8 years ago|reply
The problem is that we do not yet know how to make effective use of such code-level specifications. Finding a cost/benefit sweet spot is not easy (I personally prefer the JML approach precisely because I think it allows for greater flexibility in the choice of tradeoffs vis-a-vis confidence vs. cost).
[1]: http://www.openjml.org/
[+] [-] ivansavz|8 years ago|reply
I wonder if simplified versions of these proofs couldn't be used with students. Right now it seems like too much details, but maybe there can be a common "stdlib" of standard math arguments that we assume to be true, and then focus on the specific machinery for each proof?
[+] [-] j2kun|8 years ago|reply
It would seem to me that, if you're filling in details from an existing proof or making some existing formal (in the math sense) intuition more formal (in the Coq sense), there's not much deep insight to gain from implementing a proof in Coq.
[+] [-] pron|8 years ago|reply
I took the liberty of translating your module to TLA+, my formal tool of choice (recently, I've been learning Lean, which is similar to Coq, but I find it much harder than TLA+). I tried to stick to your naming and style (which deviate somewhat from TLA+'s idiomatic styling), and, as you can see, the result is extremely similar, but I guess that when I try writing the proofs, some changes to the definitions would need to be made.
One major difference is that proofs in TLA+ are completely declarative. You just list the target and the required axioms, lemmas and definitions that require expanding. Usually, the proof requires breaking the theorem apart to multiple intermediary steps, but in the case of the proof you listed, TLA+ is able to find the proof completely automatically:
The natDiff lemma doesn't even need to be stated, as it's automatically deduced from the built-in axioms/theorems: Another difference is that TLA+ is untyped (which makes the notation more similar to ordinary math), but, as you can see, this doesn't make much of a difference. The only things that are different from ordinary math notation is that function application uses square brackets (parentheses are used for operator substitution; operators are different from functions, but that's a subtelty; you can think of operators as polymorphic functions or as macros), set comprehension uses a colon instead of a vertical bar, a colon is also used in lieu of parentheses after quantifiers, and aligned lists of connectives (conjunctions and disjunctions) are read as if there were parentheses surrounding each aligned clause. Also `SUBSET T` means the powerset of T.Here's the module (without proofs, except for the one above):
[+] [-] curryhoward|8 years ago|reply
[+] [-] fizixer|8 years ago|reply
As I asked in a different question, is TLA+ able to do the proofs declaratively and automatically because it has an internal 'logic inference' engine?
[+] [-] losvedir|8 years ago|reply
[+] [-] carapace|8 years ago|reply
It leads me to speculate... what if I don't need types to write correct software? I assumed I would implement typing and type inference (like Factor language already has) at some point. But if the code has no bugs without it... YAGNI?
My reasoning goes like this: Let's postulate that we have a library of functions, each of which has the quality Q of being error-free. If we derive new functions from the existing ones by a process that preserves Q, so that new functions are all also error-free, then we cannot have errors in our code, just user error. I.e. the user can present bad inputs, but no other source of error exists (other than mechanical faults.) This would be a desirable state of affairs. Now if the Q-preserving new-function-deriving process doesn't require types to work, then we don't need them. Whether or not typing helps prevent user error is another question. Personally, I lean toward defining user error as UI error, but I am ranging far from my original point now.
(If you're interested: https://github.com/calroc/joypy/tree/master/docs Still very DRAFT versions. https://github.com/calroc/joypy/blob/master/docs/3.%20Develo... )
[+] [-] unknown|8 years ago|reply
[deleted]
[+] [-] pagnol|8 years ago|reply
[+] [-] yodsanklai|8 years ago|reply
It's almost like you get the bad side of programming (time-consuming, tedious) without the reward (seeing your program doing cool things). And you don't get the satisfaction of doing maths either. At least I don't. Maybe I didn't try hard enough.
What is very interesting though is to learn about type theory and the Curry Howard isomorphism.
[+] [-] phkahler|8 years ago|reply
[+] [-] maxhallinan|8 years ago|reply
[+] [-] zitterbewegung|8 years ago|reply