top | item 40364105

(no title)

thaliaarchi | 1 year ago

That's really impressive.

Automatic translation like this shifts the trust to the tool. coq-of-rust itself is written in Rust, not in Coq. The recursive nature is somewhat boggling, but I think a proof of its correctness is possible in a similar process to David A. Wheeler's “Countering Trusting Trust through Diverse Double-Compiling” (2009) [0] (which circumvents Ken Thompson's Trusting Trusting attack by using a second compiler), but with a mix of a CompCert approach.

To verify it, you'd use coq-of-rust to convert the coq-of-rust translator from Rust to Coq. That translation is not trusted, because it was performed in Rust, but it doesn't matter. Once in Coq, you prove the desired correctness properties—crucially, that it preserves the semantics of the Rust program when it translates a program to Coq.

As in the article, it is likely easier to work with more functional definitions in proofs instead of generated ones, so you'd undertake the same process as they do with the stdlib of proving equivalence between definitions. Since the current line count for the coq-of-rust translator (specifically, lib/ [1]) is 6350 lines of Rust, it even seems feasible to write a full translator in Coq and prove its equivalence to the generated one.

Then, you execute the proven-correct Coq coq-of-rust translator on the Rust source of the coq-of-rust translator. The Coq definitions it outputs should match the output of the Rust coq-of-rust translator that you started with.

As an aside, it's nice to see industry funding for work like this. I'm often cynical of cryptocurrency, but its correctness constraints really push for improvements in areas I like (Rust, Coq, funding for masters students I know, etc.).

[0]: https://dwheeler.com/trusting-trust/wheelerd-trust.pdf

[1]: https://github.com/formal-land/coq-of-rust/tree/main/lib

discuss

order

clarus|1 year ago

Thanks for the comment! One of the authors here.

Indeed this would be a nice process to verify coq-of-rust. Also, although the code is rather short, we depend on the Rust compiler to parse and type-check the input Rust code. So that would need to be also verified, or at least formally specified without doing the proofs, and the API of rustc is rather large and unstable. It could still be a way to get more insurance.

thaliaarchi|1 year ago

I didn't touch on that, but I did assume trust of the Rust toolchain, verifying starting at THIR. Verifying rustc would be a monumental undertaking, though I think some people are working on it.

Since we don't have a verified rustc (a la CompCert [0]), I wonder if an approach like the translation validation of seL4 [1] would work. They prove that the artifact (ARM machine code) produced by an existing compiler (gcc) for a chosen program (seL4) matches the source semantics (C). Thus you could circumvent trusting rustc, but it only works to verify a specific output of a chosen program. If the chosen program was coq-of-rust, I don't think this would be easier than the approach I detailed above. The seL4 kernel is 9,500 lines of C, while their Isabel/HOL specification is over 200,000 lines, so the technique doesn't seem to scale to a large chosen source like rustc.

Isn't bootstrapping fun?

[0]: Xavier Leroy. 2008. “Formal verification of a realistic compiler”. https://xavierleroy.org/publi/compcert-CACM.pdf

[1]: Thomas Sewell, Magnus Myreen, and Gerwin Klein. PLDI 2013. “Translation Validation for a Verified OS Kernel”. https://sci-hub.st/10.1145/2491956.2462183

deredede|1 year ago

Is there a Coq formalisation for enough of Rust that would be usable here? I thought people were still figuring that out.

Gajurgensen|1 year ago

Very interesting work! I'm curious how you handle loops/recursion? I imagine the `M` monad seen in the examples has a special primitive for loops?

noneeeed|1 year ago

It reminds me of when I used to work in SPARK Ada. On a number of projects where there was no supported Ada target (especially very small devices), it would be converted to C and then compiled for the target. That allowed us to perform the various forms of static analysis in the SPARK land.

It obviously introduced issues around verifying the output or the transpiler, but I think the resulting C code was quite readable and limited in style for verification purposes, and the tools had a high degree of trust.

The SPARK static analysis was only ever a part of the whole verification and validation process and there was plenty of testing and other activities that provides additional levels of trust.

The whole approach seemed to work pretty well.

im3w1l|1 year ago

> As an aside, it's nice to see industry funding for work like this. I'm often cynical of cryptocurrency, but its correctness constraints really push for improvements in areas I like (Rust, Coq, funding for masters students I know, etc.).

That's part of it, but another part I think is that crypto currencies simply transferred a lot of wealth to people interested in CS (and a wouldn't-it-be-cool-if mindset in general). And they are spending that money on CS research not just because it benefits them but because it's philanthropy aligned with their hobby.

hedora|1 year ago

I’m not sure how that approach prevents the rust compiler that’s used to build binaries from injecting its malicious payload. (You could build A with B and B with A, and diff the binaries, I guess).

Another approach is proof carrying code: The rust compiler would emit a coq proof that its executable output matches the semantics of the source code input.

Of course, you could also write a compiler of a subset of rust (perhaps without the borrow checker or optimizations) in machine code for one architecture, then bootstrap everything with that.

thaliaarchi|1 year ago

A compiler that injects backdoors in targeted programs and self-propagates the meta-backdoor (to avoid detection in the source) is exactly the trusting trust attack and it can be mitigated by diverse double-compiling (paper linked above). It requires a second compiler and we have mrustc, a Rust compiler in C++ built specifically for circumventing the unverified bootstrap chain of rustc.

The process is: Compile mrustc with a C++ compiler. Compile rustc sources with untrusted rustc binary and compile rustc sourcs with mrustc (these have identical behavior, but different codegen). Compile rustc sources with rustc-by-rustc and compile rustc sources with rustc-by-mrustc (these will have identical behavior and codegen). Those will match. If you compile once more, they will match. Since mrustc is never compiled by rustc, such a backdoor would have to also exist in gcc/clang and propagate with exactly identical behavior in mrustc. The process could be repeated for gcc/clang.