They can roll their sleeves up and do the small amount of work that they tried to persuade everyone else was not necessary. And I'm sure they will have done so.
It's not that hard to design a wide decoder that can decode mixed 2-byte and 4-byte instructions from a buffer of 32 or 64 bytes in a clock cycle. I've come up with the basic schema for it and written about it here and on Reddit a number of times. Yeah, it's a little harder than for pure fixed-width Arm64, but it is massively massively easier than for amd64.
Not that anyone is going that wide at the moment. SiFive's P870 fetched 36 bytes/cycle from L1 icache, but decodes a maximum of 6 instructions from it. Ventana's Veyron v2 decodes 16 bytes per clock cycle into 4-8 instructions (average about 6 on random code).
> Yeah, it's a little harder than for pure fixed-width Arm64, but it is massively massively easier than for amd64.
For those who haven't read the details of the RISC-V ISA: the first two bits of every instruction tell the decoder whether it's a 16-bit or a 32-bit instruction. It's always in that same fixed place, there's no need to look at any other bit in the instruction. Decoding the length of a x86-64 instruction is much more complicated.
brucehoult|1 year ago
It's not that hard to design a wide decoder that can decode mixed 2-byte and 4-byte instructions from a buffer of 32 or 64 bytes in a clock cycle. I've come up with the basic schema for it and written about it here and on Reddit a number of times. Yeah, it's a little harder than for pure fixed-width Arm64, but it is massively massively easier than for amd64.
Not that anyone is going that wide at the moment. SiFive's P870 fetched 36 bytes/cycle from L1 icache, but decodes a maximum of 6 instructions from it. Ventana's Veyron v2 decodes 16 bytes per clock cycle into 4-8 instructions (average about 6 on random code).
cesarb|1 year ago
For those who haven't read the details of the RISC-V ISA: the first two bits of every instruction tell the decoder whether it's a 16-bit or a 32-bit instruction. It's always in that same fixed place, there's no need to look at any other bit in the instruction. Decoding the length of a x86-64 instruction is much more complicated.