top | item 10263943

A simple x86 assembler C++ template metaprogram

88 points| chesterfield | 10 years ago |blog.mattbierner.com | reply

22 comments

order
[+] rayiner|10 years ago|reply
Years ago I made an x86 assembler using Common Lisp macros to expand instruction descriptions: https://github.com/rayiner/amd64-asm. Meta programming is incredibly powerful, more so when your macro language is just your host language.

The tests suite is neat (imho). At runtime it walks a data structure built at compile time to randomly exercise each of the possible encoding clauses of the instruction and compare the output to Nasm. Writing that was a very "lisp is for building organisms" moment for me.

As an aside, x86 makes a lot more sense than people give it credit for.

[+] nickpsecurity|10 years ago|reply
I did something similar. I needed the C and C++ libraries plus compiler but hated the languages. Used BASIC-like 4GL's for their power and prototyping speed. So, I just made a knockoff of a BASIC in LISP with naive compilation to C or C++. In dev mode, it just executed functions instantly as LISP. Separate mode generated code from same LISP-BASIC. Added 4GL generators as macros. Got extra benefit of interactive development and incremental compilation. Aside from first integration of libraries, I got the benefits of C/C++ without any of its pain in most cases. Ultra-fast development, too, with few lines of code for common application types.

I miss that tool. Lost it in a triple HD failure along with the others. Closest thing I've seen to my approach is iMatix's combo of DSL's and mini-4GL-for-C for more productive, correct, C-language applications. Racket has potential to exceed anything I did and for more languages if applied wisely. It's main option if I try to rebuild my tool.

Btw, what would I call a tool that combines LISP macros, 4GL features (eg DSL's, generators), C or C++ data-types, and auto-generation of C code? It doesn't necessarily need LISP syntax: one prototype used Tcl style & final looked like BASIC w/ compatibility for BASIC tools. It fits as a systems language, a 4GL, a LISP, and a C/C++ superset all at once.

[+] pflanze|10 years ago|reply
I've done a similar thing in Scheme to learn Intel assembly (it only supports a small subset of it, and produces standard .as output files instead of byte code, and I have not used it for anything in production so far):

https://github.com/pflanze/hasm

I implemented a number of higher level features for structured programming. Example input file: https://github.com/pflanze/hasm/blob/master/examples/fact.sc...

It doesn't use Scheme macros but expands assembly macros using custom code (because the bottom layer isn't Scheme anyway, just S-expressions, and so that I could let the macro expanders access their context).

Of course the difficult part would be to grow the higher level features in a way that they are safe, and I'm not sure how useful they really are when allocating registers manually. Once you build in a register allocator, you'd probably call it a compiler.

[+] kristiandupont|10 years ago|reply
I made something very similar ages ago: [link redacted] -- I remember struggling quite a bit with machine code generation until I discovered some deeply buried forum message emphasizing that the x86 architecture was originally octal-based.
[+] userbinator|10 years ago|reply
until I discovered some deeply buried forum message emphasizing that the x86 architecture was originally octal-based.

To "unbury" that fact some more, I'll again link to the message I think you're referring to...

http://reocities.com/SiliconValley/heights/7052/opcode.txt

(Google still refuses to index that link for some reason. IIRC it used to have a very high ranking, but disappeared from search results within the last year or so.)

[+] ant6n|10 years ago|reply
I'd be cool if one could leave runtime elements in there, for example leaving an intermediate value as x, with the C++ template magic reducing everything down to producing the sequence of bytes with just one integer to be filled in.

Then this could be used for on-the-fly code generation like you'd see in a jit-emulator like qemu.

[+] neelm|10 years ago|reply
This type of approach works well in image processing, where you can use C++ for the repeatable, hierarchical patterns and assembly for optimizing some key loops.
[+] tdsamardzhiev|10 years ago|reply
Expected to see brainfuck code. Actually saw a pretty neat collection of tricks! Well done :)
[+] tempodox|10 years ago|reply
fork(2) me, this is one cool application of templates. I wish it were as comprehensible as Lisp macros :-)
[+] jheriko|10 years ago|reply
i used to use that same function pointer cast trick to execute bytes from memory a long time ago.

you should find that it fails in most modern execution environments...

in legacy/desktop windows you can virtualalloc things to be executable but the same is not true in the general case, and this code doesn't do that.

even a long time ago (before windows 8, ios etc.) i wasn't able to just execute things on the stack or heap reliably and had to write code that explicitly allocated executable memory.

[+] lultimouomo|10 years ago|reply
The assembler strings are not allocated at run time, but stored in a read only section of the program (which apparently is marked as executable, if this works)