top | item 47088342

(no title)

When pushing clock speeds, things get nondeterministic...

Here is an idea for a CPU designer...

Observe that you can get way more performance (increased clock speed) or more performance per watt (lower core voltage) if you are happy to lose reliability.

Also observe that many CPU's do superscalar out of order execution, which requires having the ability to backtrack, and this is normally implemented with a queue and a 'commit' phase.

Finally, observe that verifying this commit queue is a fully parallel operation, and therefore can be checked slower and in a more power efficient way.

So, here's the idea. You run a blazing fast superscalar CPU, well past the safe clock speed limits that makes hundreds of computation or flow control mistakes per second. You have slow but parallel verification circuitry to verify the execution trace. Whenever a mistake is made, you put a pipeline bubble in the main CPU, clear the commit queue, you put in the correct result from the verification system, and continue - just like you would with a branch misprediction.

This happening a few hundred times per second will have a negligible impact on performance. (consider 100 cycles 'reset' penalty, 100*100 is a tiny fraction of 4Ghz)

The main fast CPU could also make deliberate mistakes - for example assuming floats aren't NaN, assuming division won't be by zero, etc. Trimming off rarely used logic makes the core smaller, making it easier to make it even faster or more power efficient (since wire length determines power consumption per bit).

discuss

gibspaulding|9 days ago

I think you might like this:

https://www.usenix.org/system/files/1309_14-17_mickens.pdf

gruturo|10 days ago

You could run an LLM like this, and the temperature parameter would become an actual thing...

boznz|9 days ago

Totally logical, especially with some sort of thermal mass, as you can throttle down the clock when quiet to cool down after, I used this concept in my first sci-fi novel where the AI was aware of its temperature for these reasons. I run my Pico2 board in my MP3 jukebox at 250Mhz, it has been on for several weeks without missing a beat (pun intended)

tliltocatl|9 days ago

LLM are memory-bandwidth bound so higher core frequency would not help much.

hulitu|10 days ago

> if you are happy to lose reliability.

The only problem here is that reliability is a statistical thing. You might be lucky, you might not.

ssl-3|9 days ago

How do we know if a computation is a mistake? Do we verify every computation?

If so, then:

That seems like it would slow the ultimate computation to no more than rate rate at which they can be these computations can be verified.

That makes the verifier the ultimate bottleneck, and the other (fast, expensive -- like an NHRA drag car) pipeline becomes vestigial since it can't be trusted anyway.

moffkalast|9 days ago

Well the point is that verification can run in parallel, so if you can verify at 500 Mhz and have twenty of these units, you can run the core at 10 GHz. Minus of course the fixed single instruction verification time penalty, which gets more and more negligible the more parallel you go. Of course there is lots of overhead in that too, like GPUs painfully show.

unknown|10 days ago

[deleted]

hnuser123456|10 days ago

Side channel attacks don't stand a chance!

Avlin67|9 days ago

you never had WHEA errors... or pll issue on cpu C state transition...

iberator|8 days ago

Do you design CPUs by any chance?

You should build one in some logic simulator as its super interesting architecture.

I hate hobbysts 'cpus' being inside of FPGA. We should build real hardware instead.