top | item 32917267

(no title)

TomVDB | 3 years ago

I don't agree.

Turing is an evolution of Volta. In fact, in the CUDA slides of Turing, they mention explicitly that Turing shaders are binary compatible with Volta, and that's very clear from the whitepapers as well.

Ampere A100 and Ampere GeForce have the same core architecture as well.

The only differences are in HPC features (MIG, ECC), FP64, the beefiness of the tensor cores, and the lack of RTX cores on HPC units.

The jury is still out on Hopper vs Lovelace. Today's presentation definitely points to a similar difference as between A100 and Ampere GeForce.

It's more: the architectures are the same with some minor differences.

You can also see this with the SM feature levels:

Volta: SM 70, Turing SM 75

Ampere: SM 80 (A100) and SM 86 (GeForce)

discuss

terafo|3 years ago

Turing is an evolution of Volta, but they are different architectures.

A100 and GA102 DO NOT have same core architecture. 192KB of L1 cache in A100 SM, 128KB in GA102 SM. That already means that it is not the same SM. And there are other differences. For example Volta started featuring second datapath that could process one INT32 instruction in addition to floating point instructions. This datapath was upgraded in GA102 so now it can handle FP32 instructions as well(not FP16, only first datapath can process them). A100 doesn't have this improvement, that's why we see such drastic(basically 2x) difference in FP32 flops between A100 and GA102. It is not a "minor difference" and neither is a huge difference in L2 cache(40MB vs 6MB). It's a different architecture on a different node designed by a different team.

TomVDB|3 years ago

GP100 and GP GeForce has a different shared memory structure as well, so much so that GP100 was listed as having 30 SMs instead of 60 in some Nvidia presentations. But the base architecture (ISA, instruction delays, …) were the same.

It’s true tbat GA102 has double the FP32 units, but the way they works is very similar to the way SMs have 2x FP16 in that you need to go out of your way to benefit front them. Benchmark show this as well.

I like to think that Nvidia’s SM version nomenclature is a pretty good hint, but I guess it just boils down to personal opinion about what constitutes a base architecture.