Compiling Rust for .NET, using only tea and stubbornness

[+] akavel|2 years ago|reply

Tangentially related, I've written a barebones assembler for Android .apk files once (strictly speaking, the assembler is for .dex files, but it also comes with a set of tools to package and sign .apk files). It showed up to be surprisingly easy. I expected to stumble upon some blocker issue any time that would make it impossible for me to continue — but one just never materialized! It's written mainly in Nim and provides enough primitives to allow creating Java "stubs" for native .so libraries, so that .apk-s can be built in Nim WITHOUT JDK AT ALL. The Android NDK is still kinda needed/useful, though IIRC mainly for access to adb, and especially adb logcat (which you'll need A LOT for debugging if you try to use this contraption).

I'd love to One Day™ Rewrite It In Rust, so that we could write .apk-s purely using the Rust toolchain, just using a JNI library as appropriate, sprinkling the code with some proc-macro annotations where needed by the assembler (for stubs), and possibly adding some lines in a build.rs (for .apk packaging).

The .dex assembler itself is at: https://github.com/akavel/dali — you may like to check out the tests at: https://github.com/akavel/dali/tree/master/tests to see how using it looks like.

An example project with a simple .apk written purely in Nim (NO JDK) is at: https://github.com/akavel/hellomello/tree/flappy (unfortunately, given Nim's poor packaging story, it's most probably already bitrotten to the extent that it can't be quickly and easily built & used out of the box). I recorded a presentation about this for an online Nim conference — see: https://www.youtube.com/watch?v=wr9X5NCwPlI&list=PLxLdEZg8DR...

[+] tmdh|2 years ago|reply

Amazing work.

[+] pharmakom|2 years ago|reply

Article is a little wrong about the current state-of-the-art in writing Rust bindings for .NET.

One can use Uniffi with the C# generator to get fairly automatic bindings. You still need to package it up, which is a bit of a pain.

Uniffi really is an awesome idea. I expect more and more Rust code for foundational shared libraries as a result.

[+] appwiz|2 years ago|reply

1-click link to Uniffi https://mozilla.github.io/uniffi-rs/

[+] steeve|2 years ago|reply

I'd wager that would use P/Invoke, which is what OP is trying to avoid (having true .NET IL would make everything seamless).

[+] ComputerGuru|2 years ago|reply

That just automates the generation of the ffi bindings (not too different from autocxx). It’s entirely different from compiling “managed” rust code. It’s like using a C/C++ library with p/invoke vs using C++/CLI or Managed C++.

[+] athrun|2 years ago|reply

Looks like the OP is still in high school. Kudos to them for pushing through, and having fun with compiler internals.

I wish I had this level of dedication at that age...

[+] stevefan1999|2 years ago|reply

Looks like a young prodigy to me

[+] mrweasel|2 years ago|reply

If one was to do this as something other than a "fun project", wouldn't it make more sense to do a CIL backend for LLVM? That way any language utilizing LLVM would get be able to target .NET, or am I completely misunderstanding how rustc and LLVM works?

[+] mst|2 years ago|reply

I -think- that CIL being a bytecode-y language runtime that actually has things like an object model, going straight from MIR to CIL offers a ... higher fidelity translation, or so.

(I apologise for this being vague, but this isn't really my area so if I tried to get more detailed I'd rapidly go from "quite possibly wrong" to "definitely wrong")

[+] WorldMaker|2 years ago|reply

From my understanding LLVM's intermediate language is at a "lower level" (which is the LL after all) than the .NET CLR (or the Java JVM), losing some higher level parts of type understanding and looking a lot closer to machine language. You'd have to reconstruct or synthesize a new understanding of the "lost" high level information to target them as an LLVM backend.

It sounds like Rust's internal MIR intermediate language is a somewhat closer level match to CIL.

[+] lostmsu|2 years ago|reply

I think these two have different use cases.

Adding a .NET backend to Rust could give you high-level two way interop.

Adding a similar backend to LLVM would let you use .NET target similarly to WASM in that you could compile pretty much any software (C/C++, Go, Rust, etc) and run it on any supported platform without recompiling (well, it would JIT). But you'd have to stick to C-level APIs.

[+] leowbattle|2 years ago|reply

I wonder if it could run on this Rust implementation of the CLR I wrote a few years ago: https://github.com/Leowbattle/clr_lite

[+] ComputerGuru|2 years ago|reply

That looks like a really interesting experiment! Did you ever do write up of your work? I’m curious how far you got.

[+] Nelkins|2 years ago|reply

Cool project. I remember someone else taking a crack at this a few years ago.

https://ericsink.com/entries/dotnet_rust.html

https://ericsink.com/entries/sg_rust_dotnet_preview.html

https://ericsink.com/entries/lousygrep.html

[+] ComputerGuru|2 years ago|reply

If it’s any consolation, Microsoft itself has shipped production assemblies generated with bad IL (for the auto-generated bindings/interop between the Windows 10 SDKs and C#)! Code appeared fine and would run OK until you tried to either R2R or AOT a project depending on that DLL (and it was just a single entry point that was mangled, iirc).

[+] lafar6503|2 years ago|reply

Beware, the rustification has broken thru .net defenses :) Not sure if I ever find use for Rust in .net runtime (C# has more or less same capabilities), but congrats anyway. However I'd gladly welcome some lightweight compiled language with easy and powerful meta-programming and AST transformation capabilities.

[+] CharlieDigital|2 years ago|reply

C# as a language seems supremely underappreciated/misunderstood.

It seems like it should have higher adoption given the performance boost over JS on Node while being syntactically similar to TypeScript (not hard to adopt for teams already familiar with JS/TS).

Combined with pretty good tooling these days and DX (hot reload is a thing), I'm always surprised by its seemingly lackluster reception.

[+] colejohnson66|2 years ago|reply

I really love C#, but one thing I wish it had (that Rust does) is move semantics. In C#, if someone passes your function (such as a constructor) an array, you have no guarantee the caller won’t modify it underneath you. In Rust terms, you would have a mutable reference, but the caller also does. Sometimes this is desired, and would be usable in Rust with a Cell, but other times it’s not. This can lead to defensive copying of arrays by the callee.

If I could annotate a parameter with some kind of “move” keyword that would prevent the caller from using it again, that would be great.

“Frozen collections” and ImmutableArray<T> can solve this issue, but the latter is essentially just a defensive copy of the array, but in a special type. I'm not holding my breath that such a thing would ever be implemented; Analyzers will probably be the best we get.

[+] ComputerGuru|2 years ago|reply

I’m apparently one of the few that use rust and C# as my language duo of choice; our company has gone all in on both. I think the more common combos are golang/rust, js/rust, or python/rust.

But proper support for discriminated unions (and perhaps something better than the emasculated match blocks known as switch expressions) cannot come soon enough for C# to enter the big leagues.

[+] frankster|2 years ago|reply

I thougth to myself that surely there would be a CIL backend for LLVM, and why didn't the author just use it? But amazingly there doesn't seem to be.

[+] ComputerGuru|2 years ago|reply

I personally would have been surprised to learn that such a thing existed! As a C# and rust developer who has hacked on llvm before, I have to admit that these two languages (and their underlying techs/stacks) are absolutely worlds apart and there is so little overlap between their communities.

I’d have been less surprised if there were a llvm backend for C#, but that wouldn’t exist without an IL LLVM target (because you’d be limited to the language without any (standard) library support.

The first-pass Roslyn compiler is really naive when it comes to optimizations; I constantly marvel at how little optimizations are performed in its first stages compared to what llvm does (the jit is amazingly well-tuned, however). An LLVM backend for C# would make for very interesting learning and research opportunities!

[+] tov_objorkin|2 years ago|reply

LLVM have MSIL translator back in 2007 [1], it was abandoned die to lack of interest.

1. https://discourse.llvm.org/t/msil-backend/8480

[+] e4m2|2 years ago|reply

From the linked GitHub repo:

> As for the heap allocated objects, they will be allocated from unmanged(non-GC) memory, and will be allocated/freed exactly like in Rust.

I understand this decision, but it would also be interesting to see a version of this that hijacks the global allocator and the alloc types to use the GC instead (while still allowing you to opt-out and use unmanaged memory).

Good work nonetheless!

[+] ynik|2 years ago|reply

.NET has three main types of pointers:

1. unmanaged pointers (C# syntax: T*, C++/CLI syntax: T*): these are the same as C pointers: can be converted to/from integers, cast arbitrarily, pointer arithmetic can be used. The garbage collector ignores these. These pointers can point to the stack, to native allocations. They can also point to the GC heap, but the GC won't adjust the pointer if it moves the underlying allocation (but allocations can be temporarily pinned).

2. object references (C# syntax: "T" (where T:class), C++/CLI syntax: T^): these are references pointing to the start of an object on the GC heap. They cannot point to the stack or to unmanaged memory. The garbage collector will update these as allocations are moved. Pointer arithmetic is not supported.

3. interior pointers (C# syntax: "ref T", C++/CLI syntax: T%): these references can point to the GC heap (including into the interior of objects), or to the stack or unmanaged memory. If pointing into the GC heap, the garbage collector will update these as allocations are moved. However, managed references can only live on the stack. It is not possible to store these on the GC heap; and certainly not possible to store them on the unmanaged heap (the GC wouldn't know to update them). Pointer arithmetic is supported, but conversions to integers are not.

It's not possible to translate Rust references to object references, because Rust references can point to stack or to the interior of objects. It's not possible to translate Rust references to interior pointers, because Rust references can occur on the heap, not just on the stack.

So a garbage collected version of rust is not possible without significant restrictions to the language (or using a GC more flexible than .NET's).

In addition to the limitations of the pointer types above, there's also an issue with enum types: .NET doesn't have discriminated unions. But the GC needs to be able to read the discriminator to tell if the enum contains pointers that need to be tracked by the GC.

65 comments