What you learn by making a new programming language

[+] tombert|1 year ago|reply

I've had two projects that end up being "oops, I made an interpreter".

It starts innocently enough, you just have a JSON that has some basic functionality. Then you decide it would be cool to nest functionality because there's no reason not to, so you build a recursive parser. Then you think it'd be neat to be able to add some arguments to the recursive stuff, because then you can more easily parameterize the JSON. Then you realize it might be nice to assign names to these things as you call them, you implement that, and now you're treating JSON as an AST and you're stuck with maintaining an interpreter, and life is pain.

[+] rqtwteye|1 year ago|reply

This happened to me with an XML based rules engine I wrote. First I needed conditions, then I introduced variables, then loops, then if-then-else. When I needed to handle errors, I realized that I just had invented something like BASIC in XML format. The interpreter was surprisingly short and concise mainly because XML ensured I didn't have to do the parsing.

Switched to dynamically compiled C# eventually.

[+] sjducb|1 year ago|reply

> and life is pain

So glad you finished with this. Right now I’m working with a guy who wants to write an interpreter…

[+] gavinhoward|1 year ago|reply

See also the Configuration Complexity Clock: https://mikehadlow.blogspot.com/2012/05/configuration-comple... .

This is why I have a separate config language and code language.

My config language is essentially JSON with newline separators and a first-class binary type (base64). I added little else.

When I get the temptation to add code to it, I just pull out my other, general-purpose language instead.

[+] CyberDildonics|1 year ago|reply

Lots of people learn the same lesson at some price.

Data and execution are two separate things and should remain as separate as possible.

The reason is that once you mix data and execution, suddenly you don't know what your data is until you execute it. Then you can only deal with it in the context of whatever tools you write and you can never just look at it straight.

On the other hand now execution depends on lots of data and is no longer modular or general/generic and so becomes a one off solution somewhere.

At some point everyone has the "what if" idea and hopefully it only burns them instead of lots of other people through poor design.

[+] anonymoushn|1 year ago|reply

I'm pretty happy with "json scripting" for an implementation[0] of card game[1] with relatively low rules complexity. For a time it could evaluate arithmetic expressions, but I got rid of that because it was a bit unwieldy. The main pain point is that it runs slower than I'd like, so I may end up porting it all to actual Javascript functions or to Zig.

[0]: https://github.com/sharpobject/yisim/blob/master/swogi.json

[1]: https://store.steampowered.com/app/1948800/Yi_Xian_The_Culti...

[+] axegon_|1 year ago|reply

Likewise, I've done that more times than I'd like to admit. And I took it a step further with an LLM like a month and a half ago fenced behind a number of json and yaml instructions, containing conditions and validators. It works like a charm but yeah... Oops, I did it again.

[+] lallysingh|1 year ago|reply

Every program expands until it becomes a compiler or checks mail. Emacs's sin was doing both.

[+] shortrounddev2|1 year ago|reply

> Any sufficiently complicated C or Fortran program contains an ad hoc, informally-specified, bug-ridden, slow implementation of half of Common Lisp

https://en.wikipedia.org/wiki/Greenspun%27s_tenth_rule

[+] jpgvm|1 year ago|reply

Angry upvote.

Maybe it's true failing though was stopping short of becoming the OS.

[+] bunderbunder|1 year ago|reply

In fairness, this is also every modern Web browser's sin.

[+] Doches|1 year ago|reply

> It's special that we make our own tools

I've always taken this to heart, but not necessarily with programming languages. Any piece of software that helps run my business that I can reasonably make and maintain myself, I do. I build my own CI/CD app, orchestration/deployment tool, task planner, bug tracker, release note editing & publishing tools, blog editor, logstash/viewer for exceptions, etc.

Does building (and especially maintaining!) all of these tools take up a lot of time, and distract me from the primary business of building the software that I actually sell? Sure, of course it does. But it also keeps me fresh and forces me to expand the scope of ideas that I regularly work with, and keeps me from becoming "that guy who makes that one app" and who isn't capable of moving outside of his comfort zone.

And while that doesn't (yet) extend to building my own tools in my own languages, it certainly does extend to writing my own DSLs for things like configuration management or infrastructure. My tools may be homerolled and second-rate, but they're mine (dammit!) and -- this part is important -- no one can take them away from me or catch me out with a licensing rug-pull.

[+] norir|1 year ago|reply

I think many people underestimate how easy it is to get started writing a language. It is a bit like improvising music: it's just one note followed by another note followed by another. Almost any intermediate level programmer can write a program that parses a hello, world program and translates it into the language they already know. Once you have hello, world, you add features. Eventually you realize you made mistakes in your initial design and start over but your second design will be better than the first because of the knowledge that you now have. After enough iterations, you will have a good language that you like (even though it won't be everyone's cup of tea).

[+] sixthDot|1 year ago|reply

Very true. Working on a proglang is more like running a marathon than a 100 meters. Another thing that would surprise people is that how few complex data structures or subtil algorithms are required. For example in styx-lang, my little retard baby language, there is literaly no binary searches, no sorting, no hash map, no hash sets, yet it is still fast because actually a compiler most of the time only has to deal with very small vectors (5 parameters, 10 enum members, 15 statements in a body, etc.)

[+] codr7|1 year ago|reply

Yeah, the high stakes are part of the thrill, because it's oh so easy to paint yourself into enough of a corner to have to throw the whole thing away and start over.

[+] JohnMakin|1 year ago|reply

One of the most fundamental experiences I ever had was attempting a graduate level course at the end of a long series on compilers. You really get an eye opening view of how languages are translated into the language the machine understands. After going through a few toy languages and then finally tackling creating a simple JVM, here is the #1 thing I would go back to myself and scream until I was blue -

Make your initial grammar SUPER simple. Like, don't go guns blazing and try to add all these cool features you saw in another language or always thought about. Start stupid, stupid simple, get that working, then build on top of it.

[+] lioeters|1 year ago|reply

This is why the Lisp syntax is a great candidate for an exercise in making your own language. For example, Make a Lisp. https://github.com/kanaka/mal

It's simple to lex and parse into an abstract syntax tree, so you can get on with exploring more interesting aspects of programming beyond the mere syntax. (Not to say that there aren't interesting aspects of grammar and innovative syntax, but those can probably be explored later on as macros.)

Last time I created a toy language, I implemented a C-like infix syntax but still used a Lisp evaluator at its core from a previous project.

[+] systemBuilder|1 year ago|reply

The difficulty in learning a language is proportional to the SQUARE of the number of BNF rules! Let that sink in. When last I looked, C had 120 rules and C++ had 250. C++ was already out of control and has a bunch of really stupid features that nobody with any intelligence uses for anything other than showing off (and let me tell you - there are A LOT of showoffs at Google!) Anyway, that's why C++ is 4x harder to learn than C ... I call it ... "Don's Law".

[+] hoosieree|1 year ago|reply

Building an interpreter for myself, can confirm. Saying "no" to feature requests is especially hard because I'm the one requesting them!

[+] maxbond|1 year ago|reply

> Make your initial grammar SUPER simple.

I would go further and say don't write a parser at first, unless what's novel and interesting about your language is it's syntax. Use the configuration language of your choice (like TOML or YAML) to write your ASTs directly, so you can focus on writing the runtime/backend and playing with the semantics of your language (where the novelty probably is).

When you feel it's the appropriate time, you can circle back to the frontend and implement a parser. But if you're writing a DSL, a config language may well be good enough. An additional bonus is that your config language syntax will make it easier to write certain unit tests.

I've had several experiments in writing a new language, and the first few times I got completely bogged down in parsing. I learned a lot about parsing, which is great, but it made it difficult for me to get at the meat of a project.

[+] whartung|1 year ago|reply

For many simple languages, the most complex construct is the expression.

Lots of things come to light there. Lots of recursion/fun with stacks, operator precedence, the type system, parameter passing. Pretty much a good solid chunk of language is wrapped up in expressions.

Get expressions working, and the rest starts to readily fall into place.

[+] Hunpeter|1 year ago|reply

I've been working on a toy compiler on-and-off, which is basically just a "reinvent-the-wheel simulator" since I really haven't looked at much existing literature. A very janky, bug-prone part of it is a sort-of mini parser generator, which you can feed a dictionary of rules (as strings) to. This, while slowing down the compilation speed, has allowed me to expand the grammar incrementally from dead simple to more complex, which has been a nice thing.

[+] Jeaye|1 year ago|reply

Anyone wanting to work on a new language is most welcome to help out on mine: jank. It's a native Clojure dialect on LLVM with C++ interop and all the JIT goodies one expects from a lisp.

jank is currently part of a mentorship program, too, so you can join (for sree) and get mentored by me on C++, compiler dev, and Clojure runtime internals.

1. https://jank-lang.org/ 2. https://clojureverse.org/t/announcing-the-scicloj-open-sourc...

[+] aeonik|1 year ago|reply

Were you able to implement transducers at the core of Jank? Or did you end up sticking to the existing Java implementation as much as possible?

[+] graypegg|1 year ago|reply

I think maybe a good middle ground is write an interpreter for an already spec'd esoteric language like brainfuck. [0]

It's really fun. Brainfuck specifically is great because there's a lot to optimize with only 6 total operations. (An example, multiplication has to be done as repeated addition in a loop, make a multiply AST node! [1]) and you could knock out a (BF => AST => Anything you want) compiler in an afternoon!

Bonus, there's a lot of really impressive brainfuck scripts out there. Nothing compares to seeing your compiler take in some non-sense ascii, and spit out an application that draws the mandlebrot fractal.

[0] https://esolangs.org/wiki/Brainfuck

[1] https://github.com/graypegg/unfuck/blob/master/src/optimiser...

[+] danielvaughn|1 year ago|reply

Last year I tried to build a language and I wholeheartedly agree - it's amazing how much it teaches you. My particular language was merely meant to be transpiled to other languages, so I didn't get into the runtime or compilation stuff. But I quickly learned why braces and ignoring whitespace is so important. I also had to think extremely hard and carefully about the exact syntax and what each token meant. It's a very rewarding intellectual activity.

One thing I'd like to add is that even though you can totally write your own parser, it's an absolute joy to use Tree-sitter:

https://tree-sitter.github.io/tree-sitter

I plug it every time I get a chance. It makes refactoring your grammar incredibly easy, and lets you just focus on your syntax.

[+] cvoss|1 year ago|reply

I've dreamed about making my own language for about 10 years or so. Started out just messing around. My vision for what it would be and its purpose has changed over time. About 2 years ago, I "got serious" about designing and implementing it, though that doesn't mean I've spent a serious amount of time on it yet. But it's happening!

It's a language for the domain of writing and verifying formal proofs. Basically, I didn't enjoy the experience of working with the couple of proof assistants I tried, so I'm doing my own thing. My objective is to create a language where I can document "everything I know" about math, if for no other reason than to prove to myself that I know those things, and to return to that knowledge if it ever slips away.

It's so much fun!

[+] alexwashere_|1 year ago|reply

Sounds neat - got any example code you could share?

[+] zX41ZdbW|1 year ago|reply

It will be good to add "Programming Language Checklist" to the references: https://www.mcmillen.dev/language_checklist.html

[+] rzimmerman|1 year ago|reply

I spent time on a compile-to-JS language and found it very rewarding: https://github.com/rzimmerman/kal

This was before async/generators were added to JS and callback hell was quite real. I wanted to shape it in the way I’d learned to program in Visual Basic. Very human readable. The result is no longer useful, but it was a fun goal to have the compiler compile itself.

[+] kstenerud|1 year ago|reply

I haven't made a programming language (and never will), but I did build a BNF-inspired metalanguage for describing text and binary formats to scratch the itch of trying to describe a binary data format I was developing:

The metalanguage: https://dogma-lang.org/

It's even got a syntax highlighter: https://marketplace.visualstudio.com/items?itemName=ksteneru...

The binary format I wanted to describe: https://github.com/kstenerud/concise-encoding/blob/master/cb...

[+] cardiffspaceman|1 year ago|reply

Landin wrote a paper called, "The next seven hundred programming languages."[1] The paper predicts quite a bit of the present. So I named my programming language, DCC.

[1] https://www.cs.cmu.edu/~crary/819-f09/Landin66.pdf

[+] liamilan|1 year ago|reply

I built Crumb (https://github.com/liam-ilan/crumb) a year ago, before starting university. It completely changed the way I conceptualized programming as a whole. You start feeling deja-vu every time you open a new language, and the "ah-ha!" feeling you get when you see something in another language you had to think about when implementing your own is super rewarding.

A year later (this summer) I used Crumb to land my first job at a pretty cool startup! The payoff was way more than I could have ever expected.

[+] morning-coffee|1 year ago|reply

The timing of this article is great for me as lately I'm fascinated by the Forth language and the simplicity behind its apparent strangeness. I've been tempted to start playing with similar ideas just for fun.

(https://ratfactor.com/forth/the_programming_language_that_wr... is a great read, btw.)

[+] Lerc|1 year ago|reply

I still hope to make one, but a combination of ADHD, depression and so many things to do and learn keep getting in the way.

At my current rate I'll know everything and be ready to get started on the day I die.

But my feature wishlist is

First class functions, Garbage collection, Transparent parallelism, Explicit parallelism, Type inference with strong type consistency, Dynamicly typed by annotation, Operator overloading (every language should either have vector/matrix operators or the ability to build them)

And a few more that I can't recall just now. I've made it hard for myself.

Failing that maybe just a hacked JavaScript without implicit type conversion between objects/strings etc. (the source of most "Wat?"), frozen array tuples, operator overloading. Implied "this." on identifiers defined and used inside class definitions.

[+] mjhay|1 year ago|reply

> It will be a bad language, and that's okay

This same advice could be applied to most hobbies. It doesn't have to be good, and it certainly doesn't have to make money. It just has to be fun and rewarding. If you learn something, even better.

> Go Forth, make something fun

*golfclap*

[+] loscoala|1 year ago|reply

I also came across FORTH through hacker news. I ended up developing a compiler that can translate FORTH into C.

For the interested reader:

https://github.com/loscoala/goforth

It was a great experience and I can only recommend trying to develop a programming language yourself.

[+] stevekemp|1 year ago|reply

And here's my tutorial FORTH, based upon a thread from hacker news:

https://github.com/skx/foth

Forth is always appealing, whether literally, or in puns.

[+] atum47|1 year ago|reply

Long time ago I was stuck at the airport and I end up writing a interpreter in Python. I stop the project at arithmetic, so, basically a fancy calculator. After that I saw a really interesting video about Shunting Yard algorithm, so I gave that a got as well [1]. At some point, I want to try to write a programming language, I know a little bit about assembly but it is most theory; haven't done much programming using it (only basic stuff, back in college) but I find it fascinating.

1 - https://github.com/victorqribeiro/shuntingYard

[+] PodgieTar|1 year ago|reply

I made a little toy compiler for a university project many years back, and I agree with the article - it's quite a nice way to get hands on with syntax and helps you think a bit more deeply about what is actually happening.

https://github.com/Podginator/Rattle/tree/master

It used JavaCC, which I found to be a pretty simple way to get up and running.

I also worked a job that used yacc to create their own DSL for a piece of software. Same thing, really. Easy enough to get up and running, and messing around with.

[+] tunesmith|1 year ago|reply

I've had a pet idea for a long time that is nonsensical but I still wish existed. I'd like a programming language that encodes the "why", like forces you to compile in the business reason for the code in question. And then it'd automatically survey you, and if your prior business assumptions are no long true, then compilation would fail, forcing you to remove or rewrite until the code fits your "why" again.

133 comments