(no title)
stephenroller | 5 years ago
On another note:
At 175B parameters, with float16 representations, the in memory footprint is about 350GB plus activations would take it to another 400GB. You would need 12 or 13 V100GB GPUs to hold it in memory, or three p3.8xlarge. Meaning loading it on AWS would cost around $35-40/hr.
Though if you didn't care about speed, you could load up the weights from disk one at a time and forward through it a few layers at a time on a single GPU.
freeqaz|5 years ago
Especially if you can use spot instances or a cheaper cloud host.
But I guess without the weights, the floor for this is several thousand dollars to play around with.
Do you know if the data set is being released?
stephenroller|5 years ago
You can find a very comparable corpus open sourced and easy to use on the [T5 repo](https://github.com/google-research/text-to-text-transfer-tra...)