* Turing / Quadro RTX / GTX 16xx / RTX 20XX / Volta / Tesla
EOL 2023/2024
=============
* Pascal / Quadro P / Geforce GTX 10XX / Tesla
Unsupported
===========
* Maxwell
* Kepler
* Fermi
* Tesla (yes, this one pops up over and over, chaotically)
* Curie
Older don't really do GPGPU much. The older cards are also quite slow relative to modern ones! A lot of the ancient workstation cards can run big models cheaply, but (1) with incredible software complexity (2) very slowly, even relative to modern CPUs.
Blender rendering very much isn't ML, but it is a nice, standardized benchmark:
As a point of reference: A P40 has a score of 774 for Blender rendering, and a 4090 has 11,321. There are CPUs ($$$) in the 2000 mark, so about dual P40. It's hard for me to justify a P40-style GPU over something like a 4060Ti 16GB (3800), an Arc a770 16GB (1900), or a 7600XT 16GB (1300). They cost more, but the speed difference is nontrivial, as is the compatibility difference and support life. A lot of work is going into making modern Intel / AMD GPUs supported, while ancient ones are being deprecated.
P40 is essentially a faster 1080 with 24GB ram. For many tasks (including LLMs) it's easy to be memory bandwidth bottlenecked and if you are they are more evenly matched. (newer hardware has more bandwidth, sure but not in a cost proportional manner).
I find that my hosts using 9x P40 do inference on 70b models MUCH MUCH faster than a e.g. a dual 7763 and cost a lot less. ... and can also support 200B parameter models!
For the price of a single 4090, which doesn't have enough ram to run anything I'm interested in, I can have slower cards which have cumulatively 15 times the memory and cumulatively 3.5 times the memory bandwidth.
P40 still works with 12.2 at the moment. I used to use K80s (which I think I paid like $50 for!) which turned into a huge mess to deal with older libraries, especially since essentially all ML stuff is on a crazy upgrade cadence with everything constantly breaking even without having to deal with orphaned old software.
You can get gpu server chassis that have 10 pci-slots too! for around $2k on ebay. But note that there is a hardware limitation on the PCI-E cards such that each card can only directly communicate with 8 others at a time. Beware, they're LOUD even by the standards of sever hardware.
Oh also the nvidia tesla power connectors have cpu-connector like polarity instead of pci-e, so at least in my chassis I needed to adapt them.
Also keep in mind that if you aren't using a special gpu chassis, the tesla cards don't have fans, so you have to provide cooling.
frognumber|2 years ago
SUPPORTED
=========
* Ada / Hopper / A4xxx (but not A4000)
* Ampere / A3xxx
* Turing / Quadro RTX / GTX 16xx / RTX 20XX / Volta / Tesla
EOL 2023/2024
=============
* Pascal / Quadro P / Geforce GTX 10XX / Tesla
Unsupported
===========
* Maxwell
* Kepler
* Fermi
* Tesla (yes, this one pops up over and over, chaotically)
* Curie
Older don't really do GPGPU much. The older cards are also quite slow relative to modern ones! A lot of the ancient workstation cards can run big models cheaply, but (1) with incredible software complexity (2) very slowly, even relative to modern CPUs.
Blender rendering very much isn't ML, but it is a nice, standardized benchmark:
https://opendata.blender.org/
As a point of reference: A P40 has a score of 774 for Blender rendering, and a 4090 has 11,321. There are CPUs ($$$) in the 2000 mark, so about dual P40. It's hard for me to justify a P40-style GPU over something like a 4060Ti 16GB (3800), an Arc a770 16GB (1900), or a 7600XT 16GB (1300). They cost more, but the speed difference is nontrivial, as is the compatibility difference and support life. A lot of work is going into making modern Intel / AMD GPUs supported, while ancient ones are being deprecated.
nullc|2 years ago
I find that my hosts using 9x P40 do inference on 70b models MUCH MUCH faster than a e.g. a dual 7763 and cost a lot less. ... and can also support 200B parameter models!
For the price of a single 4090, which doesn't have enough ram to run anything I'm interested in, I can have slower cards which have cumulatively 15 times the memory and cumulatively 3.5 times the memory bandwidth.
eurekin|2 years ago
nullc|2 years ago
You can get gpu server chassis that have 10 pci-slots too! for around $2k on ebay. But note that there is a hardware limitation on the PCI-E cards such that each card can only directly communicate with 8 others at a time. Beware, they're LOUD even by the standards of sever hardware.
Oh also the nvidia tesla power connectors have cpu-connector like polarity instead of pci-e, so at least in my chassis I needed to adapt them.
Also keep in mind that if you aren't using a special gpu chassis, the tesla cards don't have fans, so you have to provide cooling.
kuczmama|2 years ago