top | item 38147856

(no title)

vgatherps | 2 years ago

It would probably be even worse today. Dynamically discovering ILP “just works” even as memory gets slower and slower and slower. A CPU today can execute hundreds of instructions and many predicted branches ahead of a slow load. It would be impossible to statically schedule this (you don’t know what will/won’t be in cache), and difficult to try and hoist all loads 100 instructions in advance especially when you take branching behavior into account.

GPUs have taken over much of the niche where these processors excel, number crunching where you have entirely pre-determined memory / compute access patterns.

discuss

anarazel|2 years ago

For GPUs it only really works because the code is translated to the relevant instruction steam close to the time of executing, where you can afford to optimize in a highly uarch specific way. Whereas VLIW at the time of itanium never was in that position... It just doesn't compute for me how Intel thought this was a good plan. It's not like they didn't know that existing compiled binaries are going to continue being used on newer uarchs

p_l|2 years ago

The critical part was less VLIW, and more EPIC - the Explicit Parallel part. There were previous VLIW arches that didn't have issues with compilers, one of them afaik even formed backbone of many advanced optimizing compilers in 1990s because the vendor licensed the compiler tech to others.