rish-b | 2 years ago | on: OpenLLaMA 13B Released
rish-b's comments
A common reason is to reduce cost and latency. Larger models typically require GPUs with more memory (and hence higher costs), plus the time to serve requests is also higher (more matrix multiplications to be done).
rish-b | 2 years ago | on: LIMA: Less Is More for Alignment
This is such an interesting direction for LLM research (especially because it's easy to imagine applicability in industry as well).
If all it takes is ~1k high-quality examples (of course, quality can be tricky to define) to tune an LLM successfully, then we should expect to see these tuned LLMs for many different narrow use cases.
Of course, devil is likely in the details. Even in this paper, the prompts on which the model is evaluated were written by the authors and "inspired by their own interests or those of their friends." Can be tricky to make a jump from these prompts and answers to real world LLM use cases, but super super promising.
page 1