It's also interesting to think that IBM released an 8-trillion parameter model back in the 1980s [0]. Granted it was an n-gram model so it's not exactly an apples-to-apples comparison with today's models, but still, quite crazy to think about.[0]: https://aclanthology.org/J92-4003.pdf
lukeschantz|1 year ago