top | item 10206389

(no title)

minthd | 10 years ago

The bottleneck actually is arithmetic. "GPUs have much higher ALU throughput since the GPU chip area is almost entirely ALU"

http://devblogs.nvidia.com/parallelforall/bidmach-machine-le...

Also on the horizon there is 3d chip manufacturing technology(3d-monolithic) ,with extremely large bandwidth between the two different layers of the chip,possibly being gpu + dram.

discuss

exascale1|10 years ago

The bottleneck is not arithmetic for a long time, it's data movement. Arithmetic is practically free nowadays. See presentation by Horst Simon (Deputy Director of Lawrence Berkeley National Laboratory) "No exascale for you!" [0]

The energy cost of transferring a single data word to a distance of 5mm on-chip is higher than the cost of a single FLOP (20 pico-Joules/bit). 5mm =~ the distance to L2 cache or another CPU core. The cost of transferring data off-chip (3D chip and/or RAM) is orders-of-magnitude higher, see graph.

[0] http://iwcse.phys.ntu.edu.tw/plenary/HorstSimon_IWCSE2013.pd...

foobar2020|10 years ago

The bottleneck is often RAM. This is especially clear when writing performance-oriented code in CUDA, where the amount of cores (threads) per one shared memory controller is in the order of thousands.

cbsmith|10 years ago

...but since GPU's already exist, they kind of are the "already at large scale production" solution to the problem. For very little money you can get some pretty insane single precision throughput for SIMD calculations.

What you run in to are problems feeding the beast data fast enough.