top | item 37251385

(no title)

andolanra | 2 years ago

The former is referring to representing an AST using a GADT in order to include the typing discipline of the target language in the host language. For example:

  data Term t where
    Num :: Integer -> Term Integer
    Bool :: Integer -> Term Integer
    Add :: Term Integer -> Term Integer -> Term Integer
    IsZero :: Term Integer -> Term Bool
    IfThenElse :: Term Bool -> Term a -> Term a -> Term a

With this AST, you can express well-typed programs like `Add (Num 2) (Num 3)`, but the Haskell type system will stop if you express an incorrectly-typed program like `Add (Num 2) (Bool False)`.

The "Trees That Grow" paper, on the other hand, is about reusing the same AST but gradually adding more information to the nodes as you progress through the compiler. For example, you might want to start with variable names being raw strings (so that a term corresponding to `lambda x: lambda x: x` looks like `Lam "x" (Lam "x" (Var "x"))`) but eventually replace them with unique symbols so that shadowed names are non-identical (so that under the hood it looks more like `Lam 1 (Lam 2 (Var 2))`, although in practice you'd want to keep the old name around somewhere for debugging.)

One way to accomplish this is to introduce an explicit type-level notion of compiler phases, give your terms a type parameter which corresponds to the phase, and use the phase to choose different representations for the same nodes:

  data CompilerPhase = Parsed | Resolved
  
  data Expr (phase :: CompilerPhase)
    = Lam (Name phase) (Expr phase)
    | App (Expr phase) (Expr phase)
    | Var (Name phase)

  type family Name (t :: CompilerPhase) :: *
  type instance Name Parsed = String
  type instance Name Resolved = Int

Using this example, an `Expr Parsed` will contain variables that are just strings, while an `Expr Resolved` will contain variables that are integers, and you can write a pass `resolve :: Expr Parsed -> Expr Resolved` which just modifies the AST. (This is a toy example: in a real compiler, you'd probably want to create a new type for resolved variables that still keeps a copy of the name around and maybe some location information that points to the place the variable was introduced.)

discuss

chubot|2 years ago

Ah thanks for the answer, funnily enough that's what I recently read here - https://dev.realworldocaml.org/gadts.html

They have a bool / int expression language example. (And funny thing I'm coding up such a type checker and evaluator right now in TypeScript. TypeScript union types seem to be pretty expressive and useful for this problem.)

However I'm STILL confused ... Admittedly I skimmed through the GADT chapter and I didn't follow all of it, but I don't understand why the answer isn't:

1. have a untyped representation that allows (Add (Num 2) (Bool False))

2. write a type checker on that representation

3. The output of the type checker is a new IR, which can only express (Add 2 3) => Int and (Eq 5 (+ 3 2) => Bool

Is the more abstract GADT solution supposed to "save" you some work of writing a type checker by somehow letting you reuse Haskell or OCaml's type system to express that?

That's a little weird to me ... it seems to be confusing the metalanguage and the language being implemented.

It's not surprising to me that this would go awry, because in general those 2 things don't have any relationship (you can implement any kind of language in OCaml or Haskell).

---

Hmm OK I actually DID run into the issue where you need dynamic checks in the evaluator, that should be impossible BECAUSE the previous the type checking phase passed.

i.e. some of the rules you already encoded in the type checker, end up as "assert" in the evaluator.

Hm I will think about that. There is some duplication there, sure. But I still think there's a weird confusion there of the language the compiler is written in, and what the compiler DOES

You're kinda coupling the 2 things together, to avoid a little duplication.

I also wonder how all that is actually implemented and expanded into running code. The GADT syntax seems to get further and further away from something I imagine can be compiled :)

---

edit: Thinking about it even more, although I did run into the dynamic checks issue, I think it's because my type checker only CHECKED, it didn't LOWER the representation. So I think what I originally said was true -- if you want more type safety, you introduce another simple IR. What's the benefit of using GADTs ?

consilient|2 years ago

> That's a little weird to me ... it seems to be confusing the metalanguage and the language being implemented.

Well, that's sort of the point. It's perhaps a little less important if you're writing a compiler for a separate programming language, but that's a relatively rare use case. GADTs are mainly used for implementing DSLs to be used elsewhere in the same program.

> Is the more abstract GADT solution supposed to "save" you some work of writing a type checker by somehow letting you reuse Haskell or OCaml's type system to express that?

Yes. Writing your own typechecker and integrating it into an existing compiler pipeline is a ton of work, easy to screw up, and a potential compatibility nightmare.

> I also wonder how all that is actually implemented and expanded into running code. The GADT syntax seems to get further and further away from something I imagine can be compiled :)

GADTs are syntactic sugar for constrained type constructors. So this

    data Example a where
         Example :: Int -> Example Int

becomes this

    data Example a = (a ~ Int) => Example a

which is then compiled to a type coercion in the intermediate language