Glacial – Microcoded RISC-V core designed for low FPGA resource utilization

[+] kiwidrew|5 years ago|reply

I'm surprised that there aren't any specialised instructions or hardware resources to handle the RISC-V instruction decoding/dispatching. [1]

Like, sure, it's not meant to be a fast implementation, but even just a "mask byte with 0x7C and set PC to that value times 8" instruction (which in an FPGA implementation is just rearranging the wires) could save 5-6 cycles per instruction.

Is it really "microcoded" when all you're doing is writing a RISC-V emulator that runs on what looks to be a fairly standard 8 bit CPU?

[1] https://github.com/brouhaha/glacial/blob/master/ucode/ucode....

[+] brucehoult|5 years ago|reply

Yes, an interpreter is exactly what microcoding is. See Maurice Wilkes' original paper, or the initial IBM 360 models (the lower end of which had an 8 bit CPU running the microcode), or the various VAX models etc.

In those days the microcode ROM and ALU etc was substantially faster than RAM (core). At some point SRAM became as fast as or faster than ROM and machines copied the microcode into SRAM on startup. Some machines such as the Burroughs 1700 series loaded different microcode into SRAM depending on whether you wanted to run FORTRAN or COBOL programs.

Then companies started allowing users to write their own custom instructions in microcode. See for example the VAX "Writeable Control Store" which on the 11/780 (as an option) gave users 1024 words (12 KB) for custom microcode and a microcode assembler and debugger. Some people even wrote compilers targeting this for languages such as Pascal (see for example https://apps.dtic.mil/dtic/tr/fulltext/u2/a089424.pdf)

The next step was to turn the SRAM into a cache, and make a slightly more user-friendly microcode the actual instruction set used by all compilers, and thus RISC was born.

Of course you are correct that specialised instructions to make instruction decoding easier are helpful in an ISA emulator. It would not surprise me to see RISC-V itself get an extension along those lines in the near future, to help M-mode software emulate unaligned loads and stores and other unimplemented instructions, but maybe also to help emulate other instruction sets.

[+] throwaway81523|5 years ago|reply

> Is it really "microcoded" when all you're doing is writing a RISC-V emulator that runs on what looks to be a fairly standard 8 bit CPU?

Don't know, but the amount of "microcode" or emulation code required is itself a reasonable measure of an ISA's complexity. Doing an x86 that way would surely take tons more code.

[+] nullc|5 years ago|reply

I assume this would be an obvious starting point for looking at for the smallest specializations that give the greatest performance improvement.

I understand that there are some FPGAs now that essentially have RISC-V ALU hard blocks, so use of them might be a speed and area improvements.

[+] retrac|5 years ago|reply

Along the same lines of minimizing the amount of logic used at the cost of cycles, there's SERV which uses a bit-serial implementation with a 1-bit data path: https://github.com/olofk/serv

From time to time, I have been tempted to design a RISC-V implementation out of discrete 74xx components. Sure, there are plenty of projects out there to build your own processor from scratch like that, but most of them aren't LLVM targets!

The 32-bit datapaths and need for so many registers makes it a bit daunting to approach directly. That approach would probably end up similar in scale to a MIPS implementation I once saw done like that. (Can't find the link, but it was about half a dozen A4-sized PCBs).

Retreating to an 8-bit microcoded approach and lifting all the registers and complexity into RAM and software is a very attractive idea. Might even fit on a single Eurocard. It's not like a discrete TTL RISC-V implementation would ever be a speed demon, either way.

[+] monocasa|5 years ago|reply

If you don't know of it already, you might like the book Bit-Slice Microprocessor Design written by Mick and Brick. It's written to be very AM2900 specific, but a lot of the techniques would apply to microcoded TTL processors with just a little more work on your end. And it really does a good job of exploring the space of microcoded minicomputer design in an interesting way.

[+] klelatti|5 years ago|reply

Very interesting. Any sense of how many transistors / gates would be a likely minimum needed for an RISC-V implementation like this?

[+] ekiwi|5 years ago|reply

Glacial was one of the entries for the 2018 RISC-V SoftCPU Contest, but I think it wasn't ready by the deadline. If you look at the winners there was another 8-bit CPU with a RISC-V interpreter, SERV, VexRiscv but also Reindeer which seems like a more balanced implementation: https://riscv.org/blog/2018/12/risc-v-softcpu-contest-highli...

[+] brucehoult|5 years ago|reply

SeRV is 1 bit ALU and datapath ("bit serial") PicoRV32 is 8 bit ALU and datapath. VexRISCV is full 32 bit ALU and datapath.

[+] Scene_Cast2|5 years ago|reply

What's the cheapest FPGA that 1) has / can fit a CPU, and 2) has high-speed (~5 GHz) IO?

In my quick searches, I found that high speed FPGAs are around $10k+, and have much more fabric than I really need or want.

[+] diarmuidc|5 years ago|reply

How big is your CPU. I develop on Microsemi and the PolarFire range should do it. 12G transcievers and a decent amount of logic. At about €200+. And considering Microsemi are usually behind the curve, Xilinx and Altera must have similar https://www.microsemi.com/product-directory/fpgas/3854-polar...

[+] thrtythreeforty|5 years ago|reply

That would be the ECP5UM-5G parts. The 5G variants are somewhat more exotic but the base ECP5 is supported by Yosys and readily available on Mouser for $15-20 or so

[+] tails4e|5 years ago|reply

What do you intend to do with the 5GHz IO? Assuming it's for peripheral connectivity, then maybe the Ultra96 would work. It does not have general purpose transceivers availbals, but it has a PS (processing subsystem) that has lots of connectivity. The PL is pmenty big for many applications.

[+] mechagodzilla|5 years ago|reply

But what's the resource utilization??

[+] monocasa|5 years ago|reply

Reading through it, it's making the right tradeoffs for good utilization. Basically trading LUTs for BROM, which is what you'd want at this level.

I haven't synthesized it though, so I can't say for sure.

[+] mng2|5 years ago|reply

It's indeed a pet peeve of mine when an FPGA project doesn't give example utilization, but in this case they mention iCE40UP5K and M2S025 at least.

[+] starkruzr|5 years ago|reply

so RISC-V has been around for decades, right? why does it seem like it's blowing up now?

[+] pjscott|5 years ago|reply

The RISC approach to microprocessor design has been around since the 1980s, but the specific instruction set called "RISC-V" has only been around since 2010.

[+] _chris_|5 years ago|reply

It takes years for tooling to improve and for industry designs to start making it out into the world. As more designs make it into end-users' hands, the tooling has even more motivation to improve.

I feel like that slow evolution can make it appear "suddenly popular" despite being around for a few years. =)

[+] brucehoult|5 years ago|reply

Design work was started on RISC-V in 2010 and an initial frozen spec was released to the public in 2015. The first proper 32 bit chip and board (HiFive1/FE310) shipped in December 2016, and the first Linux-capable 64 bit chip and board (HiFive Unleashed/FU540) in April 2018.

It's pretty new.

[+] drmpeg|5 years ago|reply

Is Glacial (as in glacially slow) a good name for a CPU project?

[+] jacquesm|5 years ago|reply

In this case it is highly appropriate.

[+] sprash|5 years ago|reply

I never understood the hype around RISC-V. It is an ISA on the level of a mediocre early 90s design and does not address any of the problems we have today such as the memory latency bottleneck and the resulting topology challenges. Several completely open source designs are available that are vastly superior and real world battle tested such as OpenSPARC-T2.

So why do we need RISC-V? Is it another case of NIHS?

[+] duskwuff|5 years ago|reply

Off the top of my head:

1. RISC-V is completely unencumbered from an IP perspective. There is no possibility of a rightsholder reasserting rights on IP they had previously released (like what happened with MIPS in 2019).

2. RISC-V is legacy-free. It's an extremely "clean" design, free of weird quirks like the MIPS branch delay slot or SPARC register windows.

3. There are subsets of the RISC-V architecture defined for different sizes of systems, e.g. 32/64 bit versions, an embedded subset with fewer registers, etc. They all share an instruction set and a general architecture, and most compilers can target any subset. Some of the smaller subsets are well within the realm of what a single student can be taught to implement within a semester.

4. Numerous real implementations of RISC-V exist -- both as hardware and HDL -- are being maintained, and the hardware is available on the open market.

[+] MaxBarraclough|5 years ago|reply

edit I missed duskwuff's answer, which is better informed than mine. I'll leave this here anyway.

How open is OpenSPARC? Are there patent concerns?

RISC-V isn't aiming to revolutionise CPU architecture with a radical new design, it's aiming to offer a Free and Open, patent-unencumbered, fairly conventional RISC ISA. They're quite open about their emphasis on openness. [0]

For a project that aims to turn CPU design on its head, there's the Mill processor, although it's broadly thought to be vaporware.

[0] https://riscv.org/why-risc-v/

[+] andrekandre|5 years ago|reply

  memory latency bottleneck and the resulting topology challenges

how can an isa solve those issues?

[+] brucehoult|5 years ago|reply

A mediocre early 90s design that is totally unencumbered is a lot better than a mediocre late 70s design hacked beyond the limits of sanity.

Or would be.

The fact is RISC-V is a distinct improvement on early 90s designs such as MIPS III and has also learned lessons from Alpha, PowerPC, Itanium, and AMD64.

In many ways RISC-V and Aarch64 (which were being designed in parallel unknown to each other) learned the same lessons from those earlier ISAs, though they made several trade-offs differently.

51 comments