Tachyum starts from scratch to etch a universal processor

[+] trishume|6 years ago|reply

I really hope they open source their compiler and allow people to program directly to the machine code.

Part of the problem with GPUs (and to a greater extent FPGAs) is that the toolchains are often terrible, buggy and opaque. They also make it really hard for people to write easier abstractions on top of them. I guess CUDA does do better along many of those axes than alternatives at the cost of being vendor-specific, but anything for this would be vendor-specific too.

So much code we write doesn't take advantage of GPU power because it's harder to program for and also you pay a latency cost for transmitting your data to/from the GPU. If this architecture makes GPU-style programming easier in that you just switch to using a different style of programming in the middle of your code and the CPU just uses different instructions without a big latency penalty that would be very cool.

[+] justicezyx|6 years ago|reply

MLIR might contribute to improve the situation. Especially after Chris Lattner joined SiFive, which should give it a strong push to adopt the technology in RISC-V & its extensions ecosystems.

[+] joe_the_user|6 years ago|reply

You can look at Nvidia PTX as well as cuda. Ptx is their macro assembler code (still higher level than machine but translated by the chip itself I think).

As far as open sourcing the compiler, I suspect that having good documentation about what all the low-level instruction do would be as important if not more important than compiler source code. Just the compiler code wouldn't tell you why they do a given set of operation.

Moreover, if a group is designing the chip and compiler together, you may wind-up with a situation where they only know the compiler stuff works, they don't know what happens if you do various other things.

[+] ur-whale|6 years ago|reply

> really hope they open source their compiler and allow people to program directly to the machine code.

Be careful what you wish for.

Writing compilers (and hand-coded assembly) for theses types of 'poison bit' architectures is not for the faint of heart.

[+] en4bz|6 years ago|reply

My understanding of this product is that it is a revisiting of Very Long Instruction Word (VLIW) as seen in Itanium 20 years ago. I think that the idea of VLIW was a good idea that failed at the time due to the pure momentum of x86 and Moore's law.

Now the Moore's is basically at its end and x86 is partially stagnating, at least from Intel, other platforms like ARM are gaining traction and it seems like a good time to revisit VLIW.

I think another key factor is that most applications now run on top of platforms/frameworks rather than at the native level. This means you only need to port Linux, the JVM, node, python, and a few others and you captured a pretty large potential audience. Compare this to the mid 00's when moving to Itanium meant porting all you native apps.

[+] ur-whale|6 years ago|reply

> VLIW was a good idea that failed at the time due to the pure momentum of x86 and Moore's law.

It also failed because of Intel's very poor handling of the developers who wanted to switch to the Itanium architecture and eventually gave up because there was only support for big shops.

[+] cwzwarich|6 years ago|reply

> The processor pipeline has its out of order execution handled by the compiler, not by hardware, so there is some debate about whether this is an in order or out of order processor.

The usual problem with these sorts of CPU microarchitectures for general-purpose computing is that they can't absorb variable cache/memory load latency. How is this one any different?

There is no ordinary compilation scheme that will solve this, even with complete omniscience, since the same function with different arguments will observe different latency. Maybe some magical feedback-driven JIT could do it, but that was tried in the Itanium era and never really worked either.

[+] ema|6 years ago|reply

The Mill architecture has some neat ideas here where you can issue a load in cycle 10 but specify it to arrive in cycle, say, 80 so the cpu can be busy for another 70 cycles whether or not the loaded value is in the cache or not.

[+] _ph_|6 years ago|reply

It is great to see a new cpu architecture to come to market. It is very interesting to see, that they pick up the VLIW architecture again. For true progress, we need very different aproaches to compete. As they are using the TSMC 7nm process, the processor will be produced on a cutting edge process. So it won't be hold back by running on an inferior process. Many good design have been killed in the past, because they were produced on a process that couldn't compete with the market leaders. I wondern, how ell the Ithanium could perform, if it were ported to 7nm. Many designs, which didn't work 10 to 20 years ago, could perform vastly better on a current process.

[+] cwzwarich|6 years ago|reply

Why would you think that VLIW microarchitectures would benefit from newer process nodes more than other microarchitectures?

[+] philipkglass|6 years ago|reply

Itanium was good for high performance computing -- numerical simulations of physical phenomena. That's what I used it for. It was ok-to-poor for other workloads. I seem to recall basic utilities like "grep" being slower on my expensive employer-provided Linux/Itanium 2 workstation than on my budget x86/Linux desktop.

I can believe than a new VLIW processor can indeed perform well on HPC and ML workloads. But that doesn't sound particularly "universal" to me. Will people get good performance running relational databases on it? Graph algorithms? Compilers? Existing Java applications?

[+] jabl|6 years ago|reply

The big question is why would this startup succeed where Intel with their near bottomless coffers failed? AFAICT there have been no major improvements in compiling efficient general purpose code for VLIW architectures since.

[+] _ph_|6 years ago|reply

They are using the TSMC 7nm process, so they would even have a slight process advantage vs. the best Intel chips. The Itanium never made it beyond 32nm. Trade-offs at 7nm might work out quite differently than at 32nm. The latest Itanium chips had 3 billion transistors - that is small compared to the over 8 billion transistors of an iPhone A13 processor. So the "monster" chip Itanium - large and power consuming - would possibly make a decent mobile processor with todays technology.

[+] innovator116|6 years ago|reply

This will inevitably be compared to different approaches taken by RISC-V projects. IMHO, Only real world tests at scale can show, if it can live upto its universal processor claims.

20 comments