Five and a half years ago, one DGX-2 would be in the top 10 supercomputers in the world[1], and you'll probably be able to rent one on EC2 for under twenty bucks an hour before the year's out. You can already get the DGX-1 for under ten bucks an hour right now.
Does anyone know when graphics cards will be available at sane prices for people who actually want to use them one (or two max) at a time to render graphics?
The bad news is that a lot of system builders like us (we build pre-built systems pre-installed with Deep Learning software http://lambdalabs.com) are now used to paying more than MSRP for our GPUs. MSRP for a 1080Ti is supposed to be $699. They haven't been available even in large bulk purchases at that price for a while now. I don't see it going back down any time soon.
It's really hard to come up with a timeline, but I'd say the worst has passed. It will take time to recover inventory, but the alt-coin market (basically, not-Bitcoin cryptocurrencies) has died down a lot and the rush to acquire mining capital has likely diminished.
I can't say when, but you might be happy to know that graphics card availability has been improving throughout March, albeit at expensive, but no longer insane prices. I was monitoring availability of various AMD and Nvidia cards since January (using nowinstock.com): previously cards would be available for a few hours at a time at an outlet - now we're up to weeks and the prices have been going down a bit - they are still above MRSP. The the ongoing cryptocoin downward spiral persists for a few more months, GPU prices ought to come down.
Almost all of the Nvidia GTX cards are back in stock on Amazon (for Prime shipping) and are relatively close to MSRP. (Edit: You will still have to sort through overpriced ones)
They understood really early on that you need to invest in software just as much as you would need to invest in hardware if not more.
Open standards tend to be a horse designed by a committee, it can take years for them to evolve and to reach any consensus and they would never be able to match the speed in which hardware can evolve and adapt to market requirements.
So NVIDIA essentially made their own software ecosystem which can be just as flexible as their hardware and more importantly it allows NVIDIA to be proactive rather than reactive.
To repeat the other comments, right now CuDNN is the advantage - that is manifested as TensorFlow/Keras/PyTorch. AMD have RocM and in these benchmarks it was like 10X or more slower training for CNNs that P100s - https://www.pcper.com/reviews/Graphics-Cards/NVIDIA-TITAN-V-...
How sustainable is the advantage? Not that big - you don't need cuda compatability like hiptensorflow tried in a classic shortcuts-dont-work way. Just an alternative CuDNN for Vega that is integrated in TensorFlow distributed binaries.
They were clever to understand that developers wanted the freedom to code GPGPU in C, C++, Fortran plus any other language able to target their bytecode (PTX) instead of being bound programming in crufty C.
Then they created nice numeric libraries and graphical debuggers for GPU programming.
Their new Volta GPUs were explicitly designed to be developed in C++ (there are a few talks about it).
When Khronos woke up for the idea that maybe they should support something else other than C, invoking compilers and linkers during runtime, with OpenCL 2.0, already most developers were deeply invested into CUDA.
They don't really have an edge today. They just achieved big lock-in, and inertia of those who depend on CUDA now prevents them from using other hardware.
I think people are underestimating the difficulty of developing high performance microarchitecture for GPU or CPU.
A new clean sheet design architecture takes 5-7 years even for teams that have been doing it constantly for decades in places like Intel, AMD, ARM or Nvidia. This includes optimizing the design into process technology, yield, etc. Then there is economies of scale and price points.
Recent examples:
* Nvidia's Volta microarchitecture design started 2013, launch was December 2017
* AMD's zen CPU architecture design started 2012 and CPU was out 2017.
in deep learning, they presented high quality hand optimized building blocks way before anybody else did (CuDNN). An effect of that is that the libraries were built around cuda and cudnn, and now amd is still trying to catch up. Intel just hasn't delivered a fast enough, flexible enough, cheap enough gpu or gpu alternative afaik
The DGX-2 server (16x V100) costs the same as about 26 DeepLearning11 servers (10x 1080Ti) -
https://www.servethehome.com/deeplearning11-10x-nvidia-gtx-1...
With 260 1080Ti GPUs, you can do neural architecture search that competes with some published work by Google.
Too bad this came from the one of the worst companies in the world, according to the policy towards open source. Too bad Google playing on their side by do not merging OpenCL support in TensorFlow.
[+] [-] jsheard|8 years ago|reply
More DGX-2 information - https://www.anandtech.com/show/12587/nvidias-dgx2-sixteen-v1...
Quadro GV100 announced - https://www.anandtech.com/show/12579/big-volta-comes-to-quad...
Tesla V100 memory bumped to 32GB - https://www.anandtech.com/show/12576/nvidia-bumps-all-tesla-...
[+] [-] lsb|8 years ago|reply
[1] 1920 petaflops of 4x4+4 matrix multiply/add, and see https://www.top500.org/lists/2012/11/
[+] [-] make3|8 years ago|reply
Have they published a price for the GV100?
[+] [-] everyone|8 years ago|reply
[+] [-] sabalaba|8 years ago|reply
[+] [-] Obi_Juan_Kenobi|8 years ago|reply
6 months to a year, maybe?
[+] [-] sangnoir|8 years ago|reply
[+] [-] stagger87|8 years ago|reply
[+] [-] wmf|8 years ago|reply
If you mean Volta at a sane price, they didn't announce it today so it may be a while.
[+] [-] namlem|8 years ago|reply
[+] [-] nabla9|8 years ago|reply
AMD, Intel etc. have not been able to compute in high-performance GPU market, so Nvidia must have an edge. How big and sustainable it is?
[+] [-] dogma1138|8 years ago|reply
Open standards tend to be a horse designed by a committee, it can take years for them to evolve and to reach any consensus and they would never be able to match the speed in which hardware can evolve and adapt to market requirements.
So NVIDIA essentially made their own software ecosystem which can be just as flexible as their hardware and more importantly it allows NVIDIA to be proactive rather than reactive.
[+] [-] jamesblonde|8 years ago|reply
How sustainable is the advantage? Not that big - you don't need cuda compatability like hiptensorflow tried in a classic shortcuts-dont-work way. Just an alternative CuDNN for Vega that is integrated in TensorFlow distributed binaries.
[+] [-] pjmlp|8 years ago|reply
Then they created nice numeric libraries and graphical debuggers for GPU programming.
Their new Volta GPUs were explicitly designed to be developed in C++ (there are a few talks about it).
When Khronos woke up for the idea that maybe they should support something else other than C, invoking compilers and linkers during runtime, with OpenCL 2.0, already most developers were deeply invested into CUDA.
[+] [-] shmerl|8 years ago|reply
[+] [-] Nokinside|8 years ago|reply
A new clean sheet design architecture takes 5-7 years even for teams that have been doing it constantly for decades in places like Intel, AMD, ARM or Nvidia. This includes optimizing the design into process technology, yield, etc. Then there is economies of scale and price points.
Recent examples:
* Nvidia's Volta microarchitecture design started 2013, launch was December 2017
* AMD's zen CPU architecture design started 2012 and CPU was out 2017.
[+] [-] make3|8 years ago|reply
[+] [-] throwaway84742|8 years ago|reply
[+] [-] wlesieutre|8 years ago|reply
https://www.anandtech.com/show/12587/nvidias-dgx2-sixteen-v1...
[+] [-] jamesblonde|8 years ago|reply
The DL11 also pays for itself in about 90 days compared to renting a P100/V100 on AWS: http://www.logicalclocks.com/price-calculator/
[+] [-] jftuga|8 years ago|reply
[+] [-] xvilka|8 years ago|reply