top | item 30988256

(no title)

lobstey | 3 years ago

the biggest problem first of all might be the memory requirements given so many parameters. It couldn't be as cheap as a high end computer in the foreseeable future.

discuss

order

f38zf5vdt|3 years ago

There is probably a space-time trade off that needs to be explored in this space. It might be possible to preload the some of the most likely tokens to be selected next into the cache and/or RAM. These are glorified auto-complete algorithms that are poorly understood, as DeepMind's optimizations appear to show. For the English language, it is probable that there are only so many possible grammatically correct selections for the next token, for example.

visarga|3 years ago

Glorified autocomplete? Autocomplete can guess the next word .. sometimes, GPT-3 goes hundreds of words ahead. On generic topics it can be hard to distinguish from human text.

And it can't cache tokens because all tokens are evaluated in the context of all the other tokens, so they don't have the same representations when they reoccur at different positions.

priansh|3 years ago

I’ve been saying this for years, language models are the ML equivalent of the billionaire space race, it’s just a bunch of orgs with unlimited funding spending millions of dollars on compute to get more parameters than their rivals. It could be decades before we start to see them scale down or make meaningful optimizations. This paper is a good start but I’d be willing to bet everyone will ignore it and continue breaking the bank.

Can you say that about any other task in ML? When Inceptionv3 came out I was able to run the model pretty comfortable on a 1060. Even pix2pix and most GANs fit comfortably in commercial compute, and the top of the line massive models can still run inference on a 3090. It’s so unbelievably ironic that one of the major points Transformers aimed to solve when introduced was the compute inefficiency of recurrent networks, and it’s devolved into “how many TPUs can daddy afford” instead.