top | item 47031204

(no title)

rep_lodsb | 14 days ago

Forth generated code is basically a long series of "assembler macros", always doing the same maximally-generic thing for each primitive operation. Even a very simple-minded compiler of the 1980s could already beat that.

    VAR1 @ VAR2 @ + VAR3 !

will execute this at run time:

    ; push address of VAR1
    inc    bx
    inc    bx
    push   bx
    ; fetch and jump to next primitive
    lodsw
    mov    bx,ax
    jmp    [bx]
    ; push contents of variable
    pop    bx
    push   [bx]
    ; next primitive...
    ; push address of VAR2, next...
    ; push contents of variable, next...
    ; add top two stack elements, push sum, next...
    ; push address of VAR3, next...
    ; store to address, next...

There are some "low-hanging fruits", like keeping top-of-stack in a register, which the Forth used here doesn't do though. Or direct threading.

Still, an incredibly stupid compiler could do better (for execution speed, definitely not size) by outputting similar code fragments - including all the pushes and pops, who needs a register allocator? - just without the interpreter overhead (lodsw etc.) in between them.

A compiler producing worse code likely didn't exist before today's glorious world of vibe coding ;)

A slightly better compiler would directly load the variables instead of first pushing their addresses, etc. You don't need to be an expert in compiler theory to come up with enough ideas for boiling it down to the same three instructions that a competent assembly programmer would write for this operation. And at least for this case, Forth doesn't even have the size advantage anymore, the code would only take 10 bytes instead of 14.

discuss

ajross|14 days ago

The compiler space in 1985 was really thin. You were basically looking at Microsoft/Lattice C and Turbo Pascal. And while I don't have any of them handy for a test, that's pretty much exactly the character of the code they'd generate. In particular the 8086 calling conventions were a kludgey dance on the stack for every function, something forth could actually improve on.

rep_lodsb|14 days ago

I know Turbo Pascal produces really bad code (even in later versions), but it's not on the same level as a non-optimized Forth. Function prologue on x86 doesn't have much overhead either.

It's somewhat closer for 8-bit processors, the most popular ones had either no convenient addressing mode for stack frames at all, or an extremely slow one, like IX/IY on the Z80. For even more primitive CPUs, you might already need a virtual machine to make it "general-purpose programmable" at all -- for example if there is only one memory address register, and no way to save its contents without clobbering another one. I think that some of Chuck Moore's earliest Forth implementations were for architectures like that.

Also memory constraints could have been more important than efficiency of course. I'm not saying Forth is worthless, but it's not competing with any compiled language in terms of speed, and IMHO it also does away with too many "niceties" like local variables or actually readable syntax. Your mileage may vary :)