top | item 37937336 (no title) lappa | 2 years ago More data, more parameters, more compute all result in a better model per "Scaling Laws for Neural Language Models"https://browse.arxiv.org/pdf/2001.08361v1.pdfLargeness is a valid goal. discuss order hn newest quickthrower2|2 years ago Also: costs more for inference, uses more energy, less practical for running locally, fewer use cases as a result. Especially for an open model.Being on Github / HuggingFace but needing to be on a AWS or Nvidia wait list to get the resources to run it is not great.In an unlimited energy and chip world I would agree just make em bigger.I guess going bigger has a greater chance of success in being SOTA than looking at architectures. So I get people don’t want to gamble. huac|2 years ago rebuttal: compute optimality matters https://arxiv.org/pdf/2203.15556.pdf
quickthrower2|2 years ago Also: costs more for inference, uses more energy, less practical for running locally, fewer use cases as a result. Especially for an open model.Being on Github / HuggingFace but needing to be on a AWS or Nvidia wait list to get the resources to run it is not great.In an unlimited energy and chip world I would agree just make em bigger.I guess going bigger has a greater chance of success in being SOTA than looking at architectures. So I get people don’t want to gamble.
quickthrower2|2 years ago
Being on Github / HuggingFace but needing to be on a AWS or Nvidia wait list to get the resources to run it is not great.
In an unlimited energy and chip world I would agree just make em bigger.
I guess going bigger has a greater chance of success in being SOTA than looking at architectures. So I get people don’t want to gamble.
huac|2 years ago