I was an intel cpu architect when transmeta started making claims. We were baffled by those claims. We were pushing the limit of our pipelines to get incremental gains and they were claiming to beat a dedicated arch on the fly! None of their claims made sense to ANYONE with a shred of cpu arch experience. I think your summary has rose colored lenses, or reflects the layman’s perspective.
nostrademons|3 months ago
It's worth noting that Google actually did succeed with a wildly different architecture a couple years later. They figured "Well, if CPU performance is hitting a wall - why use just one CPU? Why not put together thousands of commodity CPUs that individually are not that powerful, and then use software to distribute workloads across those CPUs?" And the obvious objection to that is "If we did that, it won't be compatible with all the products out there that depend upon x86 binary compatibility", and Google's response was the ultimate in hubris: "Well we'll just build new products then, ones that are bigger and better than the whole industry." Miraculously it worked, and made a multi-trillion-dollar company (multiple multi-trillion-dollar companies, if you now consider how AWS, Facebook, TSMC, and NVidia revenue depends upon the cloud).
Transmeta's mistake was that they didn't re-examine enough assumptions. They assumed they were building a CPU rather than an industry. If they'd backed up even farther they would've found that there actually was fertile territory there.
cpgxiii|3 months ago
Except "the cloud" at that point was specifically just a large number of normal desktop-architecture machines. Specifically not a new ISA or machine type, running entirely normal OS and libraries. At no point did Google or Amazon or Microsoft make people port/rewrite all of their software for cloud deployment.
At the point that Google's "bunch of cheap computers" was new, CPU performance was still rapidly improving. The competition was traditional "big iron" or mainframe systems, and the novelty was in achieving high reliability through distribution, rather than building on fault-tolerant hardware. By the time the rate of CPU performance improvement was slowing in the mid 2000s, large clusters smaller machines were omnipresent in supercomputing and HPC applications.
The real "new architecture(s)" of this century are GPUs, but much of the development and success of them is the result of many iterations and a lot of convergent evolution.
fajitaforce5|3 months ago
hinkley|3 months ago
With blackjack, and hookers!
empw|3 months ago
I wonder if the x86 teams at Intel people were similarly baffled by that.
jasonwatkinspdx|3 months ago
EPIC aka Itanium was conceived around trace optimizing compilers being able to find enough instruction level parallelism to pack operations into VLIW bundles, as this would eliminate the increasingly complex and expensive machinery necessary to do out of order superscalar execution.
This wasn't a proven idea at the time, but it also wasn't considered trivially wrong.
What happened is that the combination of OoO speculation, branch predictors, and fat caches ended up working a lot better than anticipated. In particular branch predictors went from fairly naive assumptions initially to shockingly good predictions on real world code.
The result is that conventional designs increasingly trounced Itanium as the latter was still baking in the oven. By the time it was shipping it was clear the concept was off target, but at that point Intel/HP et all had committed so much they tried to just bully the market into making it work. The later versions of Itanium ended up adding branch prediction and more cache capacity as a capitulation to reality, but that wasn't enough to save the platform.
Transmeta was making a slightly different bet, which is that x86 code could be dynamically translated to run efficiently on a VLIW cpu. The goal here was two fold:
First, to sidestep IP issues around shipping an x86 compatible chip. There's a reason AMD and Cyrix are the only companies to have shipped intel alternatives in volume in that era. Transmeta didn't have the legal cover they did, so this dynamic translation approach sidestepped a lot of potential litigation.
Second, dynamic translation to VLIW could in theory be more power efficient than a conventional architecture. VLIW at the hardware level is kinda like if a cpu just didn't have a decoder. Everything being statically scheduled also reduces design pressure on register file ports, etc. This is why VLIW is quite successful in embedded DSP style stuff. In theory, because the dynamic translation pays the cost of compiling a block once then calls that block many times, you could get a net efficiency gain despite the cost of the initial translation. Additionally, having access to dynamic profiling information could in theory counterbalance the problems EPIC/Itanium ran into.
So this also wasn't a trivially bad idea at the time. Transmeta specifically targeted x86 compatible laptops as that was a bit of a sore point in the Wintel world at the time, where the potential power efficiency benefits could motivate sales even if absolute performance still was inferior to intel.
From what I recall hearing from people who had them at the time, the Transmeta hardware wasn't bad but had the sort of random compatibility issues you'd expect and otherwise wasn't compelling enough to win in the market vs Intel. Note this was also before ARM rose to dominate low power mobile computing.
Transmeta ultimately failed, but some of their technical concepts in detail have been continued on in how language JITs and GPU shader IRs work today. Or how Apple used translation to migrate off both PowerPC and x86 in turn.
In both the case of Itanium and Transmeta I'd say it's historically inaccurate to say they were obviously or trivially wrong at the time people made these bets.
BirAdam|3 months ago
hinkley|3 months ago
I wonder how long before we try it again.
foobiekr|3 months ago
jasonwatkinspdx|3 months ago
gdiamos|3 months ago
From my perspective it was more exciting to the programming systems and compiler community than to the computer architecture community.
rediguanayum|3 months ago
ghaff|3 months ago
stinkbeetle|3 months ago
All failures. Everybody went back to more conventional out of order designs and were able to find ways to keep scaling those.
I'm sure there were some people at all these companies who were always OOOE proponents and disagreed with these other approaches, but I think your summary has poop colored lenses :) It's a little uncharitable to say their ideas were nonsense. The reality is that this was a very uncertain and exploratory time, and many people with large shreds of cpu arch experience all did wildly different things, and many went down the wrong roads (with hindsight).
anamax|3 months ago
While I don't know what his group shipped, some implementations of trace caches do store "improved" operation sequences.
carabiner|3 months ago
onecommentman|3 months ago