top | item 15290338

(no title)

iheartmemcache | 8 years ago

RE: Portability - Not sure how far you'll be able to get by gcc -S'ing something like nuklear[1] (cross-platform ANSI C89) but it might save you some time.

I don't have much HLL asm/demoscene experience personally so I'm not sure what's "impressive" as engineering feats these days but this looks cool. As someone who aspires to see a viable Smalltalk-like runtime self-modifiable introspective debugger at the OS level with a decent layer of POSIX compatibility and the ability to run AVX512 instructions, I like the idea that tools like this are out there. Cheers, mate

[1] https://github.com/vurtun/nuklear

discuss

order

johnfound|8 years ago

> RE: Portability - Not sure how far you'll be able to get by gcc -S'ing something like nuklear (cross-platform ANSI C89) but it might save you some time.

The big problem with using "gcc -S" is that as a result you have a HLL program, simply written as an assembly language listing.

The humans write assembly code very different than HLL. Even translated to asm notation, this difference will persist. Asm programmer will choose different algorithms, different data structures, different architecture of the program.

Actually this is why in the real world tasks, regardless of the great compiler quality, the assembly programmer will always write faster program than HLL programmer.

Another effect is that in most cases, deeply optimized asm program is still more readable and maintainable than deeply optimized HLL program.

In this regard, some early optimizations in assembly programming are acceptable and even good for the code quality.

exikyut|8 years ago

> As someone who aspires to see a viable Smalltalk-like runtime self-modifiable introspective debugger at the OS level

That's an interesting pile of keywords you've got there.

I don't know about Smalltalk (I find Squeak, Pharo, etc utterly incomprehensible - I have no idea what to do with them), but for some time I've been fascinated with the idea of a fundamentally mutable and even self-modifying environment. My favorite optimization would be that, in the case of tight loops with tons of if()s and other types of conditional logic, the language could JIT-_rearrange_ the code to nop the if()s and other logic just before the tight loop was entered - or even better, gather up the parts of code that will be executed and dump all of it somewhere contiguous.

C compilers could probably be made to do this too, but that would break things like W^X and also squarely violate lots of expectations as well.

iheartmemcache|8 years ago

This is sort of implemented in various forms.

For a VM, RE: code rearrangement, you're effectively describing dynamic DCE if I understand you correctly, CLR does this (and lots more)[2].

At the low-level programmer level, there's nothing stopping a (weakly) static language like C from adopting that behavior[3] at runtime [i.e. with a completely bit-for-bit identical, statically linked executable which].

At the compiler level, you've got the seminal Turing Award by Ken Thompson that does it at compiler level[4].

At the processor level, you heuristically have branch prediction as a critical part of any pipeline. (I think modern Intel processors as of the Haswell era assign each control flow point a total of 4 bits which just LSL/LSR to count the branch taken/not taken. (Don't quote me on that)).

RE: Smalltalk - for me, the power of the platform's mutability was revealed when I started using Cincom. When I was using GNU implementations ~10 years ago, they felt like toys at the time (though I hear things have largely improved). If you've ever used Ruby, a simple analogy would be the whole "you can (ab)use the hell out of things like #Method_Missing to create your own DSLs". This lends of a lot of flexibility to the language (at the expense of performance, typing guarantees). In a Smalltalk environment, you get that sort of extensibility + static typing guarantees + the dynamic ability to recover from faults in a fashion you want.

Imagine an environment[5] that has that structured instrinsically + the performance of being able to use all them fancy XMM/YMM registers for numerical analysis + a ring0 SoftICE type debugger. Turtles all the way down, baby.

=====

[1] See ISL-TAGE of CBP3 and other, more modern reportings from "Championship Branch Prediction" if it's still being run).

[2] https://stackoverflow.com/a/8874314 Here's how it's done with the CLR. The JVM is crazy good so I'd imagine the analogue exists there as well.

[3] https://en.wikipedia.org/wiki/Polymorphic_code

[4] http://wiki.c2.com/?TheKenThompsonHack

[5] Use some micro-kernel OS architecture so process $foo won't alter $critical-driver-talking-to-SATA-devices or modifying malloc. I'd probably co-opt QNXs Neutrino designs since it's tried and true. Plus that sort of architecture has the design benefit of intrinsically safe high-availability integrated into the network stack.