(no title)
Tobba_ | 8 years ago
Heck, current x86 chips could be juiced quite a bit if you could take out the requirement for backwards compatibility. Instruction encoding being the obvious thing (not that it's not hip and RISC, but that it's an absolute mess that a huge proportion of the chips power has to be wasted on, and is pretty space-inefficient due to how horribly allocated things are). Less obviously just removing things like the data stack instructions (which, at least on Intel, have a dedicated "stack engine" to optimize them), the ability to read/write instruction memory directly (creates a mess of self-modifying code detection to maintain correct behaviour, and complicates L1 cache coherency a bit). Trimming transistors reduces the power consumption, which in turn means you can raise the voltage without the chip melting, and can clear up space in your critical data path.
gpderetta|8 years ago
On smaller low power cpus it is more significant of course.
The stack engine is necessary anyway even if you have no specific stack instructions, as it removes the dependence of the top of stack manipulation from local variable accesses which is critical. Explicit stack manipulation instructions might actually make the stack engine simpler.
Coherent instruction cache and pipeline are super relevant in this age of pervasive self modifying code (a.k.a JIT).
Modern CPUs are complex for a reason.
ryanpetrich|8 years ago
Tuna-Fish|8 years ago
None of the changes to x86 people have thought of over the years really helps enough to break backcompat. Simply because they aren't on the fast path on the critical execution stage. The limit imposed on frequency by power in current cpus is not really the total amount of power consumed, it's the amount of power consumed in the <0.25mm of chip that houses the register file, forwarding network and alus. That is, the place were things actually happen during the most important pipeline stage. This is why a 8-core cpu running just a single thread cannot make one of the cores consume as much power as all the 8 would if running 8 threads -- the register file of the running core would just melt, even if the total power would stay below chip limits.
x86 decoding is hairy and takes a long time and a lot of transistors. However, it is placed in it's own pipeline stages, that are ran parallel to the execute and only slow it down by making a branch miss a little more expensive. And the power is limited today by caching the decoded uops in their own cache, so during any tight loop, the decode hardware is idle and consumes no power. The same sort of goes for the stack engine -- as it runs early in the pipeline, it is basically a way to compress instructions a little that saves power by making code more compact when it is running, and does nothing when it is not used. Removing it would not really help, even if all code instantly changed to accommodate. Much of the rest of the ugly warts of the x86 architecture is handled in the time-honored CISC way: just punt it to microcode, performance be damned. Today, self-modifying code technically works, but you never want to do it because invalidating lines in the L1i has been implemented in the way that is the fastest and cheapest way to make the common case of code that does not modify itself. (And which has to exists even if you don't support self-modifying code, because there has to be some way of invalidating L1i entries.) Similarly, a lot of the CISC instructions that make more sense to implement as software routines (fpu sin/cos for example) are today just abandoned ucode routines that are slower than rolling your own.
Tobba_|8 years ago
Also, I don't think the trouble with added complexity out of the hot path is any added latency, it's that they're needlessly burning up the thermal budget. Not that raising the voltage is the best way of increasing frequency, but it's sure to do so.