top | item 26527260

(no title)

asrp | 5 years ago

Sorry if I've asked these years ago and just don't remember the answer.

> The '/copy-to-ebx' after the slashes is just my way of helping the reader understand what the instruction does. I don't want the reader to have to consult the Intel manual for every instruction, even if I'm forcing the writer to do so.

Why not make the comment the instruction and the bytes the (maybe even optional?) comment in that case then?

From your first post.

> The fact that C compilers are written in C contributes a lot of the complexity that makes compilers black magic to most people.

Isn't this more a symptom of C though? I'm hoping this is generally not true if you replace C with other languages (but could be very wrong). But more generally, I'm thinking you could make "the compiler's inner workings is not black magic" a constraint rather than make not writing the higher level language in the higher level language the constraint.

In my case, I tried that first route and then moved to instead having the compiler written in the higher level language but emitting output that's close enough to (my) handwritten lower level language.

I'll have to read your two part post more carefully though. Glad to see this project getting some attention, even though in an unusual fashion.

discuss

akkartik|5 years ago

Great questions! I've actually never considered putting the comment first! I'll have to think about that one.

You're right to point out that there are two components to "C compilers written in C make compilers seem complex": the metacircularity, and C-specific difficulties. I think I was focusing on the first when I wrote that, but I can't exclude the possibility you raise. A better language might reduce the need to understand it operationally, by looking under the hood to understand what a line of code is translated to. The Mu way may well be a dead end, since the requirement of understanding translated code restricts how complex compiler optimizations can get. You probably don't want to understand Haskell's loop fusion by comparing source and generated code.

In my mind there's an idea maze where there are 3 major possibilities for improving the future of software:

a) Simple languages and translators that are easy to understand by running them. This is the Mu way.

b) Type-driven languages that are easy to understand by reading them. Haskell and OCaml seem to fit here, and they may well be the right answers.

c) Complex languages that discourage abstractions atop them. This is the APL way, and it too might end up being the right way.

I'm doing a) mostly because it seems to fit my brain better. I just can't seem to get into Haskell or OCaml or APL.

asrp|5 years ago

> I've actually never considered putting the comment first! I'll have to think about that one.

I'm sure there are many competing constraints so definitely don't do it because I'm suggesting this on a whim. :) My reasoning is that as a human reader, the comment is the more readable part, so I'd want to see it first. And for a computer, it probably doesn't care if the op code appears first or not.

> You probably don't want to understand Haskell's loop fusion by comparing source and generated code.

Indeed. But even though C and Haskell are very different, I think they share a common philosophy about compilation where you can basically do whatever you want as long as it still produces the same result.

I vaguely remember looking at Python generate bytecode (with `dis.dis`) and seeing it wasn't too bad. I haven't tried it on a larger program though.

There's tcc (and more recently chibicc that I haven't had a chance to check out yet) that you're probably already aware of. Is the generated output still pretty bad.

I'll also throw my own attempt in the ring

- High level https://github.com/asrp/flpc/blob/master/lib/stage0.flpc - Low level (up to line 45) https://github.com/asrp/flpc/blob/master/precompiled/self.f

even though it's not quite optimized for this purpose and the code itself is still a bit unclean. If there was a syntax highlighter for the low level language, I'd probably highlight "[", "]" and "bind:" as a start. I can try to clarify any obscure syntax or primitive.

Some more general ideas to get aroud the issue. - Invoke optimization only when asked specifically (and apply the optimization locally). That is, optimization would need at least additional syntax in the language. - Explicitly track correspondance between source and target (at the character or token level) and also do this in each optimization pass. Maybe even keep the intermediate values of each pass so you can browse through it like a stack trace.

> In my mind there's an idea maze where there are 3 major possibilities for improving the future of software:

I guess I'm trying another route even though I don't know if it fits the definition of improving the future of software.

d) Have programmers make their own compiler/interpreter and language by giving them the tools and knowledge to do that (more) easily.

This would (hopefully) avoid the black box/magic issue since the programmer would know the details of the inner workings by virtue of having written it. Though I'm most definitely very far from the goal and the questions can be asked about how to improve their target language.