(no title)
barelyauser | 1 year ago
Things like "Compute-Optimal Sampling" sound just like any other made up gibberish that may or may not exist. Wordings like "memory-centric subsampling", "search based hyper space modeling", "locally induced entropy optimization" don't get parsed. And more often than not after reading such papers, I've come to find out that it is a fancy name for something a toddler knows about. Really disappointing.
_hl_|1 year ago
Of course there are some (possibly many!) papers where jargon is abused to make something sound smarter. Sometimes this can also happen unintentionally.
In this case, "compute-optimal X" is standard terminology used in large-scale ML model design for finding the most optimal tradeoff with regards to compute when trying to achieve X.
Here, the paper is about finding the optimal model size tradeoff when training on LLM-generated synthetic data. Imagine you have a class of LLMs, from small to infinitely large. The larger the LLM, the higher the quality of your synthetic data, but you will also spend more compute to generate this data ("sampling" the data). Smaller LLMs can generate more data with the same compute budget, but at worse quality.
The paper does some experiments to find that in their case, you don't always want the largest possible LLM for synthetic data (as previously thought by many practitioners), instead you can get further by making more calls to a smaller but worse LLM.
danielmarkbruce|1 year ago
You are never going to win the jargon battle. It is what it is. People wrap up entire concepts in a few words and hell if they can be bothered writing out the details of the concept over and over again.