top | item 38486185

(no title)

fm77 | 2 years ago

For all who wonder why Turbo Pascal was so fast here some insights:

50% is certainly due to the language Pascal itself. Niklaus Wirth designed the language in a way so it can be compiled in a single pass. In general the design of Pascal is in my opinion truly elegant and compared to other programming languages completely underrated. Wirth published a tiny version of his compiler written in Pascal itself in a 1976 book called "Algorithms + Data Structures = Programs".

In the late 70s Anders Hejlsberg took that version and translated it into assembly. He certainly must have changed the codegenerator since Wirth's version emitted bytecode for a tiny VM whereas Anders version produced machinecode, however if you take a closer look especially at the scanner and parser of Turbo Pascal and Wirth's version you can see that they are very similar. Back then Anders was not so much a language guy in my opinion but much more an assembly genius. And that resulted in the other 50% of why Turbo Pascal was so fast:

-) The entire compiler (scanner/parser/typechecker/codegenerator/ and later the linker) was written in assembly.

-) The state of the compiler was held as much as possible in cpu registers. If e.g. the parser needed a new token from the tokenstream, all registers were pushed to the stack and the scanner took over. After the scanner fetched the next token, registers where restored.

-) The choice of which register hold what was also very well thought through. Of course the cpu dictates that to a certain extent but still lots of elegant usage of the "si"/"di" register in combination of non repetitive lodsb/stosb instructions were done.

-) The entire "expression logic" (expression parsing / expression state / code generation for expressions) was kinda object oriented (yes, in assembly) with the "di" register hardwired as the "this" pointer. If the compiler needed to handle two expressions (left expression and right expression), then one was held in the "di" register and the other one in the "si" register. Since the "di" register was hardcoded, you will find lots of "xchg di,si" in the codebase before a "method" (a procedure with the "di" register as a "this" pointer) will be called.

-) Clearly the cpu registers were not enough in order to hold the entire state of the compiler so heavy use of global variables were made. Global variables have the advantage of not needing a register in order to access them (e.g. "inc word ptr [$1234]").

-) Parameter passing was done through registers and were possible stack frames were avoided (too expensive), meaning no local variables (still heavy usage of push/pop within a procedure, does this count as a local?)

-) Parameter passing done through registers allowed procedure chaining: instead of "call someOtherProc; retn" at the end of a procedure just "jmp someOtherProc" was used to a great extent.

-) Jump tables everywhere. In general the compiler was quite table driven.

-) Avoiding of strings as much as possible and if needed (parsing identifiers / using filenames) then avoiding to copy the strings around as much as possible, meaning all strings were held in global variables. The big exception here was of course the copying of the identifiers from the global variable into the symbol table.

-) Starting with Turbo Pascal 4.0, hash tables were used as symbol tables. Arena allocator for memory management.

I am sure I forgot a lot, I reverse engineered Turbo Pascal back in the late 90s. Most of the above applies to Turbo Pascal 7.0, but lots have not changed in earlier versions over time.

It is a shame that such a wonderful codebase is buried under the "closed source, proprietary software" label. It is clear that today nobody would write a compiler the way Turbo Pascal was written, not even in a high level language but the codebase has some many tricks, so many elegant solutions, that it is a real pity that this is not open source. Of course the codebase is on the web, just not the official one.

Thank you Anders Hejlsberg for such a wonderful piece of software.

discuss

dist1ll|2 years ago

All of this is fascinating. I believe single-pass compilation is underrated, and quickly disregarded by a large body of the PL community as anachronistic. I think that's complete nonsense. Just take one look at the massive build infrastructure that's driving modern monorepos to see how incredibly crucial fast compile-times are.

> Jump tables everywhere. In general the compiler was quite table driven.

It would be interesting to see how this approach fares in the face of modern branch prediction on modern CPUs.

zozbot234|2 years ago

Single-pass is a bit of a gimmick. It requires programs to be written sequentially in a strictly "bottom up" way, so that forward references to parts of the program that have yet to be defined are rare enough that they can be marked specially (e.g. as with C program headers).

It's also largely irrelevant if you want optimized code generation, especially across multiple procedures, since that requires you to read abstract representations of the code into memory and deal with them globally which is the opposite of single-pass compilation.

senozhatsky|2 years ago

> meaning no local variables (still heavy usage of push/pop within a procedure, does this count as a local?)

Isn't it (push/pop) how locals basically work?