universal_sinc | 2 years ago | on: AVX10/128 is a silly idea
universal_sinc's comments
universal_sinc | 2 years ago | on: Arm’s Neoverse V2
Let's take a simple example: Instead of modeling a 64-bit adder in all its gory transistor level detail, you can just have the model return the correct data after 1 "cycle" or whatever your ALU latency is. As long as that cycle latency is the same as the real hardware, you'll get an accurate performance number.
What's particularly useful about these models is they enable much easier and faster state space exploration to see how a circuit would perform, well before going ahead with the Verilog implementation, which relatively speaking can take circuit designers ages. "How much faster would my CPU be if it had a 20% larger register file" can be answered in a day or two before getting a circuit designer to go try and implement such a thing.
If you want an open source example, take a look at the gem5 project (https://www.gem5.org). It's not quite as sophisticated as the proprietary models used in industry, but it's a used widely in academia and open source hardware design and is a great place to start.
universal_sinc | 2 years ago | on: Arm’s Neoverse V2
First, they create detailed software models (usually in C++) of their chips to estimate performance as closely as they can before laying out a single transitory. These models can run code just like a real hardware device, albeit slowly.
Once the chip is designed, verilog simulators are programs used to generate the exact logical output of a circuit, which can be used to measure performance on a workload. However, this method is even slower than the first!
For larger workloads and higher speed, they use extraordinarily expensive FPGA-based platforms called Emulators. This allows circuits to be run at speeds in the MHz range before ever being sent to a fab. Booting an OS, running a complex multicore workload with shared memory, they can measure almost any workload. But this method is not available until late in the design phase and the boxes themselves are prohibitively expensive from being deployed very widely.
The software models are the most useful for estimating performance, as long as they are written early and well :)
universal_sinc | 3 years ago | on: Fastest-ever logic gates could make computers a million times faster
universal_sinc | 5 years ago | on: 100-GHz Single-Flux-Quantum Bit-Serial Adder Based on 10-KA/Cm2 Niobium Process
universal_sinc | 5 years ago | on: Intel outsources Core i3 to TSMC's 5nm process
universal_sinc | 5 years ago | on: AMD Zen 3 Ryzen Deep Dive Review
universal_sinc | 5 years ago | on: AMD Zen 3/Ryzen 5000 announcement [video]