top | item 46715701

Experimenting with a Compiled Language

4 points| JhonPork | 1 month ago

I’ve been experimenting with a compiled language design and wanted feedback from people who’ve worked on compilers or systems code.

The core idea is allowing multiple execution profiles to coexist in a single source file:

- userland (default): indentation-based, Python-like syntax, safety by default - kernel: brace-based syntax, strict rules, no heap or runtime - baremetal: brace-based syntax, raw pointers, no safety net

At build time, a single profile is selected and all other code is erased at compile time — no runtime checks or overhead.

The compiler pipeline is lexer → parser → typed IR → LLVM backend, with profile rules enforced during IR validation rather than at runtime.

Roughly ~90% of the core language and compiler are implemented; I plan to open-source the project once the remaining pieces are finished.

This is still an experiment. I’m mainly curious: - does this model make sense? - are there obvious design pitfalls? - has anyone seen similar approaches succeed or fail?

I’d appreciate critical feedback.

8 comments

order

platinumrad|1 month ago

> At build time, a single profile is selected and all other code is erased at compile time — no runtime checks or overhead.

Can you expand on this?

JhonPork|1 month ago

Profiles are resolved before code generation, not via conditionals. Each top-level item (function, block, impl, import) can be annotated with a profile (userland, kernel, baremetal). During parsing, everything is collected into the AST as usual. During IR lowering, the compiler is invoked with exactly one active profile. At that point: Nodes whose profile does not match are not lowered to IR at all They are dropped during IR validation, not guarded or compiled The resulting IR literally has no trace of the other profiles So this is not like #ifdef or runtime flags. The non-selected code never reaches: borrow checking optimization codegen linking From LLVM’s point of view, it’s as if the other code never existed. That’s why there’s no runtime overhead: no branches, no checks, no dead code elimination required. The IR is profile-pure by construction. This also lets the compiler enforce different rules per profile: userland: heap allowed, panics allowed kernel: no heap, no panic, stricter aliasing baremetal: raw pointers, UB allowed Invalid combinations simply fail IR validation. Happy to clarify further once the repo is public.

forgotpwd16|1 month ago

Unusual concept. Will split my thoughts on implementation and adoption (in regards to design).

Implementation-wise: Tried myself something similar. One language (same core & lib & built-ins) with 2 front-ends that had different syntax and, but similar, semantics. (Didn't go far though.) An issue is not favoring one over the other. Inevitably, in a meta way, you'll if decide to self-host since will've to pick one form to do it. Also, having multiple co-existing forms in same file may complicate tooling.

About the last part, most similar things I've seen are: (i) Perl Mojo's embedded templates which can be included in same file with source code, (ii) Racket's #lang multi-file which allows combining, well, multiple files (thus also use different #lang directives) in same file.

Adoption-wise: It's in a weird position for widespread adoption. There's strong preference towards using a single language which splits into 2 branches: (1) using single language across every layer (basically Rust/Zig), (2) using high-level language with native-competive performance (Python+numpy, jax, etc / JS+ultrafast JIT / Julia).

Currently you target both (one base language) and none (different syntax/semantics). Could move towards an hybrid approach instead by having one syntax and high-level / low-level forms (uncertain what distinguishes kernel/baremetal currently). So some functionality may end up showing differently in the 2 cases to be more acceptable by both camps. This will probably also simplify tooling creation/maintenance.

Of course, since the project is quite experimental in nature, keeping it current way is interesting and very acceptable.

TL;DR Yes (~templating) - Yes (complexity, lower potential adoption) - No (unusual, experimental)

JhonPork|1 month ago

This is fair feedback, and you’re pointing at the main tradeoff intentionally. One clarification that might help: Falcon isn’t multiple frontends or multiple grammars in the usual sense. The parser accepts all code into a single AST, but during IR lowering the compiler is invoked with exactly one active profile. Nodes whose profile doesn’t match are not lowered to IR at all — they’re rejected before borrow checking, optimization, or codegen. From the compiler’s point of view, the other profiles never existed. There’s no runtime guard, no macro-style inclusion, and no shared assumptions leaking across profiles. The goal isn’t to let people freely mix levels like unsafe {} in Rust, but to make domain boundaries explicit and enforceable. Kernel/baremetal code has fundamentally different invariants (no heap, no panic, different aliasing rules), and soft escape hatches tend to blur those over time. That said, I agree this does increase tooling complexity and may reduce adoption. This is very much an experiment to see whether hard separation + single IR is a better tradeoff for certain projects than one-size-fits-all semantics. Appreciate the comparison examples — they’re useful references.