top | item 16752827

(no title)

Tobba_ | 8 years ago

I think we're far from the ceiling on CPU performance so far, but we seem to have hit a (micro)architectural dead end. Currently a lot of time and transistors is spent simply shuffling data around the chip, or between the CPU and memory, while the actual computational units simply sit idle. Or similarily, units that sit idle because they can't be used for the current task, even if they should be - the FPUs on modern x86 cores are a pretty good example of this. FP operations are just fused integer/fixed-point operations, but it's been designed into a corner where it has to be a special unit to deal with all the crap quickly.

We've probably optimized silicon transistors to death though; that's why it's coming to a stop now. GaAs or SiGe are some of the alternatives there. Although there's still quite a lot of advancements there that simply aren't economical yet. For example, SOI processes at low feature sizes seem to be suitable for mass-produced chips now, but it hasn't made it out of the low-power segment yet. MRAM seems to be viable and might be able to provide us with bigger caches (in the same die area), but right now it's mainly used to replace small flash memories (plus some more novel things like non-volatile write buffers, but it's horrifically expensive). So we've probably got a few big boosts left there, but it's not gonna last forever.

The next obvious architectural advancement right now is asynchronous logic. In theory, it's superior in every way - power and timing noise immunity, speed isn't limited by the worst-case timings, no/reduced unnecessary switching (i.e lower power, meaning higher voltages without the chip melting itself). On paper, you run into some big problems on the data path - quasi-delay-insensitive circuits need a lot more transistors and wires, and the current alternative is to use a separate delay path to time the operations, which is a bit iffy. You do at least get rid of the Lovecraftian clock distribution tree that's getting problematic for current synchronous logic. In practice, the tools to work with it and engineers/designers that know how to work it don't exist, and the architecture is entirely up in the air. So it's many years of development behind right now and a huge investment that nobody really bothered with while they could just juice the microarchitecture and physical implementation.

discuss

nominatronic|8 years ago

> You do at least get rid of the Lovecraftian clock distribution tree that's getting problematic for current synchronous logic.

No, you don't. You make it even bigger and far more complex.

You can take any synchronous design, and refine the clock gating further and further, to the point where no part of it gets a clock transition unless it actually needs it on that cycle.

And then when you're finished, congratulations, you've made an asynchronous circuit.

Fully asynchronous design and perfect clock gating are one and the same thing.

The clock distribution and gating approaches we already have are actually a sign of progress towards asynchronous design; they're just quite coarse-grained.

Of course, it's probably not the case that a clock-gating transform of an conventional synchronous design is also the best possible solution to a problem, so there's clearly still scope for improvement. But a lot of the possible improvements are probably equally applicable, or have equivalents in, optimising clock distribution and gating in synchronous design - because that's ultimately the same thing as moving towards asynchronicity.

So talking about clock distribution issues as a problem that will just go away with asynchronous design is misleading.