(no title)
andolanra | 2 years ago
data Term t where
Num :: Integer -> Term Integer
Bool :: Integer -> Term Integer
Add :: Term Integer -> Term Integer -> Term Integer
IsZero :: Term Integer -> Term Bool
IfThenElse :: Term Bool -> Term a -> Term a -> Term a
With this AST, you can express well-typed programs like `Add (Num 2) (Num 3)`, but the Haskell type system will stop if you express an incorrectly-typed program like `Add (Num 2) (Bool False)`.The "Trees That Grow" paper, on the other hand, is about reusing the same AST but gradually adding more information to the nodes as you progress through the compiler. For example, you might want to start with variable names being raw strings (so that a term corresponding to `lambda x: lambda x: x` looks like `Lam "x" (Lam "x" (Var "x"))`) but eventually replace them with unique symbols so that shadowed names are non-identical (so that under the hood it looks more like `Lam 1 (Lam 2 (Var 2))`, although in practice you'd want to keep the old name around somewhere for debugging.)
One way to accomplish this is to introduce an explicit type-level notion of compiler phases, give your terms a type parameter which corresponds to the phase, and use the phase to choose different representations for the same nodes:
data CompilerPhase = Parsed | Resolved
data Expr (phase :: CompilerPhase)
= Lam (Name phase) (Expr phase)
| App (Expr phase) (Expr phase)
| Var (Name phase)
type family Name (t :: CompilerPhase) :: *
type instance Name Parsed = String
type instance Name Resolved = Int
Using this example, an `Expr Parsed` will contain variables that are just strings, while an `Expr Resolved` will contain variables that are integers, and you can write a pass `resolve :: Expr Parsed -> Expr Resolved` which just modifies the AST. (This is a toy example: in a real compiler, you'd probably want to create a new type for resolved variables that still keeps a copy of the name around and maybe some location information that points to the place the variable was introduced.)
chubot|2 years ago
They have a bool / int expression language example. (And funny thing I'm coding up such a type checker and evaluator right now in TypeScript. TypeScript union types seem to be pretty expressive and useful for this problem.)
However I'm STILL confused ... Admittedly I skimmed through the GADT chapter and I didn't follow all of it, but I don't understand why the answer isn't:
1. have a untyped representation that allows (Add (Num 2) (Bool False))
2. write a type checker on that representation
3. The output of the type checker is a new IR, which can only express (Add 2 3) => Int and (Eq 5 (+ 3 2) => Bool
Is the more abstract GADT solution supposed to "save" you some work of writing a type checker by somehow letting you reuse Haskell or OCaml's type system to express that?
That's a little weird to me ... it seems to be confusing the metalanguage and the language being implemented.
It's not surprising to me that this would go awry, because in general those 2 things don't have any relationship (you can implement any kind of language in OCaml or Haskell).
---
Hmm OK I actually DID run into the issue where you need dynamic checks in the evaluator, that should be impossible BECAUSE the previous the type checking phase passed.
i.e. some of the rules you already encoded in the type checker, end up as "assert" in the evaluator.
Hm I will think about that. There is some duplication there, sure. But I still think there's a weird confusion there of the language the compiler is written in, and what the compiler DOES
You're kinda coupling the 2 things together, to avoid a little duplication.
I also wonder how all that is actually implemented and expanded into running code. The GADT syntax seems to get further and further away from something I imagine can be compiled :)
---
edit: Thinking about it even more, although I did run into the dynamic checks issue, I think it's because my type checker only CHECKED, it didn't LOWER the representation. So I think what I originally said was true -- if you want more type safety, you introduce another simple IR. What's the benefit of using GADTs ?
consilient|2 years ago
Well, that's sort of the point. It's perhaps a little less important if you're writing a compiler for a separate programming language, but that's a relatively rare use case. GADTs are mainly used for implementing DSLs to be used elsewhere in the same program.
> Is the more abstract GADT solution supposed to "save" you some work of writing a type checker by somehow letting you reuse Haskell or OCaml's type system to express that?
Yes. Writing your own typechecker and integrating it into an existing compiler pipeline is a ton of work, easy to screw up, and a potential compatibility nightmare.
> I also wonder how all that is actually implemented and expanded into running code. The GADT syntax seems to get further and further away from something I imagine can be compiled :)
GADTs are syntactic sugar for constrained type constructors. So this
becomes this which is then compiled to a type coercion in the intermediate language