Ask HN: How to learn to write a compiler and interpreter?
46 points| lala_lala | 7 years ago
I am in my late 30s without a formal CS education. I write software for a living and have this itch to learn to write a compiler / interpreter.
DO I have to take any courses (e.g. Discrete Maths, Automata Theory) in order to do it?
Do I have to take any courses? If so, do you have recommendations for those as online courses?
I really want to do this in 2019 but the thought that I have to study Maths, then Automata, maybe OS, then Compilers sounds like a long path.
Is there who has learned to write compiler / interpreter without CS background? How did you do it?
stephen82|7 years ago
The you can read more formal CS book for the sake of covering the necessary theory.
Another suggestion would be to check the source code of various SQL databases, such as SQLite3 and PostgreSQL.
Also, I would suggest to read lots of lots of source code from github.
Here is a short list of suggestions:
WaltPurvis|7 years ago
Writing An Interpreter In Go https://amzn.to/2FYwwiQ
Writing A Compiler In Go https://amzn.to/2sLilph
ablerman|7 years ago
http://craftinginterpreters.com/
lala_lala|7 years ago
shoo|7 years ago
ksherlock|7 years ago
You have a text file. You break it up into tokens (numbers, strings, keywords, operators, etc) . You assemble the tokens into a structured tree (statements, expressions, etc). Then you walk the tree and generate code (for a compiler) or execute it (for an interpreter). That's it. Really.
Every other week, somebody posts a tutorial to HN about writing a lisp interpreter in javascript. or go. or python. or rust. or lisp. If you can't wait, Let's Build a Compiler by Jack Crenshaw is an oldie but a goodie.
https://compilers.iecc.com/crenshaw/
shoo|7 years ago
> Maths, then Automata, maybe OS,
i don't think any of these are a requirement for building a simple compiler.
after reading Crenshaw some years ago, i was motivated to just knuckle down and start building a compiler for brainfuck (yes, a toy compiler, entirely pointless). this was probably one of the most enjoyable periods of programming i can remember.
i think my descent into fun with compilers went something like:
* "let's implement a compiler for a simple language (brainfuck) to something low level (assembler) in a high-level language i am familiar with (python)"
* ok, that was easy, now what?
* "self-hosting compilers are cool. can i write a brainfuck compiler in brainfuck?"
* "this is horrifying"
* "OK, what if i defined a simple language that is much easier to do useful work, but could be readily compiled to brainfuck? i could build a compiler from that new language to brainfuck"
If you have a bit of spare time i heartily recommend (i) learning to implement a basic compiler, and (ii) building the compiler in or for a language that has a radically different concept to languages you are more familiar with.
https://github.com/fcostin/abfc
faical|7 years ago
[1] https://github.com/ftchirou/blink
[2] https://hackernoon.com/lets-build-a-programming-language-261...
(P.S. Feel free to reach out to me [github_username] at gmail.com)
tudelo|7 years ago
PhilWright|7 years ago
1) Lexical Analysis (Tokeniser)
The first step of your compiler takes in free form text and turns it into a set of tokens that make the rest of the process much easier. To understand this you should learn about Regular Expressions and Finite State Machines.
2) Syntax Analysis (Parsing)
Processing the above tokens and ensuring the input has the valid syntax of a program is your next step. You need to learn about Context Free Grammers, Backus Naur Form and Abstract Syntax Trees.
3) Semantic Analysis
Just because the input has valid syntax does not mean it makes actual sense. Here you need to learn about Symbol Tables and Type Checking. The specifics are very dependant on the actual language you are dealing with.
4) Code Generation
A compiler will be generating code but an interpreter will perform actual execution of the Abstract Syntax Tree operations. Again, the specifics are very dependant on the actual language you are dealing with and the target. Start by looking into Code Generation, Code Optimizations and Assembly Code.
I recommend an online, free course, like the following from Stanford that is self-paced and results in a compiler that generates assembly level instructions for test running in an emulator.
https://lagunita.stanford.edu/courses/Engineering/Compilers/...
ryanmccullagh|7 years ago
I would recommened first, figuring out what you want your language to look like (the syntax). Then, you can define a grammar for your language, and research parsers. From parsing, you will be leade to Abstract Syntax Trees (the most modern way to get from text file to bytecode). After that you'll need to compile the abstract syntax tree (which you would have built from the parser). Compiling the AST is typically done targetting your own simplified instruction set.
For an example instruction set, check out Python's header file at https://github.com/python/cpython/blob/master/Include/opcode...
kazinator|7 years ago
How much CS comes into play depends on how advanced you make the compiler and what design choices you make. E.g. if you make a stack-based VM, or one with unlimited registers, instead of targeting a real machine with a fixed number of registers, you don't have to study the algorithms for register allocation.
Some languages avoid the complexities of parsing, such as those in the Forth family, and to a large extent Lisp family. To make a language, you don't even need to be a consumer of scanner and parser generating tools, let alone understand how they work.
e19293001|7 years ago
Compiler Construction Using Java, JavaCC, and Yacc, IEEE/Wiley, 2012
Get this book. Please. __I promise__. You'll be able to grok compilers after reading and answering the exercises from this book.
We can talk further by sending me an email (see my profile).
Here's my previous comment somewhere in HN:
I bet this is what you are looking for: https://www.amazon.com/Compiler-Construction-Using-Java-Java....
This book taught me how to write a compiler and build a programming language of my own.
To answer you question "where and how did you start?": This is where I started learning about compilers and programming language implementation.
Here is its description from its website:
* Comprehensive treatment of compiler construction.
* JavaCC and Yacc coverage optional.
* Entire book is Java oriented.
* Powerful software package available to students that tests and evaluates their compilers.
* Fully defines many projects so students can learn how to put the theory into practice.
* Includes supplements on theory so that the book can be used in a course that combines compiler construction with formal languages, automata theory, and computability theory.
What I promise you with this book: You'll learn how to write your own programing language. Not only how compilers and about the language but also about computer architecture. This book is very easy to read and understand. It starts with very basic topics then slowly building from it until you'll grok implementing the language given the specification. You'll have the chance to build a compiler from scratch or by using JavaCC and Yacc.
Someone|7 years ago
Also, initially forget about making a product that doesn’t crash on invalid input. If a…z is enough for everybody, your toy language may crash on
or Then, add features as you see fit. You will run into problems, such as questioning whether should be a valid program, or whether it’s a good idea if prints ‘20’, but that’s part of the fun (if you have a puzzling mind, figuring out how to do that ‘properly’ without reading about it isn’t out of reach). And those features you add shouldn’t be large. For example, looping: add simple if statements: The end result will likely be buggy in places. For example, if you decide to support and parse by splitting each line on semicolons, adding strings may result in an interpreter/compiler that is broken for a while. Who cares? You aren’t building a compiler or interpreter, you’re learning.Also: start with an interpreter. A compiler isn’t that much harder, but requires you to know about your system’s assembly, object code format, etc.
creatornator|7 years ago
[0] https://gnuu.org/2009/09/18/writing-your-own-toy-compiler/
vkaku|7 years ago
1. Create a grammar for your language. Start with Lex/Yacc and model your grammar with Lex/Yacc. Use them to generate lexer/parser.
2. Create sample code in your new language. Parse the generated parser on said language file into a list of Syntax trees.
3. With said Syntax trees, generate target code - Compiler (or) evaluate the tree and print results - Interpreter.
Iterate and repeat steps until your language is complete.
auganov|7 years ago
What's your goal?
I've written non-trivial macros that could be understood to be a tiny compiler, but wasn't ever interested in compiler design and I can't claim to know much about it.
lala_lala|7 years ago
Maybe it's a folly but I am just intrigued by how they work and want to be able to develop them without committing to taking lot of courses.
nouney|7 years ago
[0] https://en.wikipedia.org/wiki/Compilers:_Principles,_Techniq...