I do use a P40 for my machine learning box, but I'm curious how you put three on the same system, given they need a CPU power plug and a pci-e port. Then, to cool them, you need to plug your own cooling system, requiring more specific power plugs to be available. What kind of chassis, motherboard, power unit you use to do that? It'll certainly will cost more than $1000 anyway, especially since you also need a decent amount of RAM to preload the models before you move them to the GPUs.
http://nonint.com/ has some interesting posts about how he build a custom server to house 8 GPU's (3090's in this case). You're right that that will set you back more than $1000, though I was only referring to the GPU's themselves.
gengolas|2 years ago
esquire_900|2 years ago
amstan|2 years ago
washadjeffmad|2 years ago
I still use GPTQ for 30B, but even CPU generates quickly enough at q5_1 on modern hardware.