top | item 37937336

(no title)

lappa | 2 years ago

More data, more parameters, more compute all result in a better model per "Scaling Laws for Neural Language Models"

Largeness is a valid goal.

discuss

Also: costs more for inference, uses more energy, less practical for running locally, fewer use cases as a result. Especially for an open model.

Being on Github / HuggingFace but needing to be on a AWS or Nvidia wait list to get the resources to run it is not great.

In an unlimited energy and chip world I would agree just make em bigger.

I guess going bigger has a greater chance of success in being SOTA than looking at architectures. So I get people don’t want to gamble.

huac|2 years ago

rebuttal: compute optimality matters https://arxiv.org/pdf/2203.15556.pdf