That second table is aa good example of why always including units (or even just a "higher is better") is a good idea... I have no clue what I'm looking at.
Edit: It's been edited, thx Evolution :) (or I totally glossed over it the first time around... but I don't think so)
Even after being edited, it's still wrong. It shows the significantly lower Inception4 performance as a "40% speedup" instead of 40% of baseline images/sec.
This is a poor comparison of performance. All of these networks are CNNs, and very old architectures at that. They are all probably memory bottlenecked which is why you see the consistent 50% improvement in FP32 perf.
It is also not clear what batch sizes are being used for any of the tests. If you switch to FP16 training, you must increase the batch size to properly utilize the Tensor Cores.
If you compare these cards at FP16 performance on large language models (think GPT-style with large model dimension), I am confident you will see Titan RTX outperform the 3090. The former has 130 TF/s of FP16.32 tensor core performance while the latter has only 70 TF/s.
The 3090 RTX is also $1000 cheaper than the Titan, so there's that. It would be nice if there was a good way to express value per dollar. Perhaps in GLUE accuracy and training time.
For many of us, the Inception-style CNN workloads--especially at FP32--are much more realistic than large language models that may be better suited to take advantage of the tensor cores. If I'm going to be memory bottlenecked either way, I probably don't want to spend an extra $1000 on 400 tensor cores I can't take full advantage of.
Aside: Nit: Don't use gradients for discrete categories in a graph. Use a discrete color palette that perceptually distances colors as much as possible using a tool like this: https://medialab.github.io/iwanthue/
Seems like a good speedup relative to the Titan, especially for the money. I’d be interested to see the performance relative to the 3080 though. There are obviously vram limitations with the 3080 but it would still be interesting to see the difference in raw compute performance.
In games the 3090 only gives a 15% performance bump relative to the 3080. If that pattern holds for machine learning tasks there is probably a scenario where it makes sense to buy two 3080s rather than one 3090.
If you are vram constrained then obviously the 3090 the way to go.
Could you kindly advise what kind of computer would make sense to purchase to begin learning about ML? I was assuming I'd get a 3080. Should I get a case that could potentially house 2 x 3080's? Does the case require any special cooling considerations, or just whatever will fit the cards? What CPU would you get?
I think for high throughput scenarios the 3090 probably has more headroom due to its higher TDP and better (larger) cooling solution, which might really matter here if you're driving the tensor cores at max the whole time.
Most video games probably aren't going to make the most of all of the extra CUDA cores on the 3090. I'm assuming that helps alot with machine learning, can someone who knows for sure confirm?
Honestly I had an RTX Titan for home use for a while. Eventually I moved to just using a 2080 Super and it performed at nearly the same power for my models. If you don't need ALL the extra memory and have the space for a triple slot then the better value proposition by far for last gen seemed to be a good super.
See also Tim Detter's fantastic post on GPU performance (which doesn't use benchmarks for the latest cards but instead calculates performance with a model):
Seems to be good speedup overall relative to 2080 Ti (including FP16: see relatives 2080 Ti v Titan: https://lambdalabs.com/blog/2080-ti-deep-learning-benchmarks...). This suggests we should see another Titan card that is even more expensive in the pipeline given the FP16 performance? Or maybe TF32 performance is going to be what NVIDIA promotes in this generation (only if they have better number than FP16?)?
Here's hoping for an A100 titan with un-nerfed FP64. The 3090 is twice as nerfed as previous generations, which were also bad at 1:32. Now it's 1:64 :(
Can someone explain the difference between fp16 and fp32 in these benchmarks because the difference is pretty dramatic. I assume it's floating point precision(?) but why would lower precision be slower relatively on the 3090? For training jobs how does the precision impact accuracy of the model?
Edit: clarified that I am referring to slower relative performance
NVidia drivers b0rked rebooting my box for a long time.
A couple of months ago, I removed all the references in apt sources, and followed the newer instructions (several times to get the right driver/cuda/tensorflow match) and my reboots are great, and only one GPU lock up so far (probably due to overheating - I've had to replace a couple of components flag as failed due to the heatwave in summer)
Jupyter hub is just great, I'd like to implement better diagnostics though ... have yet to find a good tutorial for that as yet.
If I have a really remote location and I need to do on-premises inference, am I better off buying one of the gaming GPUs or are they far behind the T4, etc.?
NVIDIA nerfs FP64 performance on consumer GeForce for recent years. It's critical for scientific calculations but not needed for ML. Alternatively they banned to run GeForce on datacenter.
No, 3090 has nerfed tensor cores and in some apps Titan RTX is 5x faster (Siemens NX). FP32 accumulate is at 0.5x like with 2080Ti, while Titan's is at 1x.
[+] [-] jakear|5 years ago|reply
Edit: It's been edited, thx Evolution :) (or I totally glossed over it the first time around... but I don't think so)
[+] [-] TomVDB|5 years ago|reply
[+] [-] Sayrus|5 years ago|reply
A "Higher is better" might still be interesting although redundant.
[+] [-] hughes|5 years ago|reply
[+] [-] zamadatix|5 years ago|reply
[+] [-] mamon|5 years ago|reply
[+] [-] ml_hardware|5 years ago|reply
It is also not clear what batch sizes are being used for any of the tests. If you switch to FP16 training, you must increase the batch size to properly utilize the Tensor Cores.
If you compare these cards at FP16 performance on large language models (think GPT-style with large model dimension), I am confident you will see Titan RTX outperform the 3090. The former has 130 TF/s of FP16.32 tensor core performance while the latter has only 70 TF/s.
Link: https://www.nvidia.com/content/dam/en-zz/Solutions/geforce/a...
[+] [-] binarymax|5 years ago|reply
[+] [-] dplavery92|5 years ago|reply
[+] [-] systemvoltage|5 years ago|reply
[+] [-] valine|5 years ago|reply
In games the 3090 only gives a 15% performance bump relative to the 3080. If that pattern holds for machine learning tasks there is probably a scenario where it makes sense to buy two 3080s rather than one 3090.
If you are vram constrained then obviously the 3090 the way to go.
[+] [-] wombatmobile|5 years ago|reply
Could you kindly advise what kind of computer would make sense to purchase to begin learning about ML? I was assuming I'd get a 3080. Should I get a case that could potentially house 2 x 3080's? Does the case require any special cooling considerations, or just whatever will fit the cards? What CPU would you get?
[+] [-] kevingadd|5 years ago|reply
[+] [-] wnevets|5 years ago|reply
[+] [-] p1esk|5 years ago|reply
[+] [-] hhhhhuu|5 years ago|reply
[+] [-] BookPage|5 years ago|reply
[+] [-] arijun|5 years ago|reply
https://timdettmers.com/2020/09/07/which-gpu-for-deep-learni...
HN Discussion:
https://news.ycombinator.com/item?id=24400603
[+] [-] liuliu|5 years ago|reply
[+] [-] jjoonathan|5 years ago|reply
[+] [-] bryan0|5 years ago|reply
Edit: clarified that I am referring to slower relative performance
[+] [-] bufo|5 years ago|reply
[+] [-] Der_Einzige|5 years ago|reply
[+] [-] unknown|5 years ago|reply
[deleted]
[+] [-] smallnamespace|5 years ago|reply
[+] [-] danbr|5 years ago|reply
[+] [-] rkwasny|5 years ago|reply
+ tf-nightly and other python libraries installed through pipenv
[+] [-] paol|5 years ago|reply
It helps keeping to Ubuntu LTS versions though, that's what they support best.
[+] [-] opless|5 years ago|reply
A couple of months ago, I removed all the references in apt sources, and followed the newer instructions (several times to get the right driver/cuda/tensorflow match) and my reboots are great, and only one GPU lock up so far (probably due to overheating - I've had to replace a couple of components flag as failed due to the heatwave in summer)
Jupyter hub is just great, I'd like to implement better diagnostics though ... have yet to find a good tutorial for that as yet.
[+] [-] fareesh|5 years ago|reply
[+] [-] motorcitycobra|5 years ago|reply
[+] [-] fomine3|5 years ago|reply
[+] [-] bitL|5 years ago|reply
[+] [-] unknown|5 years ago|reply
[deleted]