top | item 41435207

(no title)

I find that researchers choices of names for the sake of differentiation is more of a barrier than something helpful. Sometimes it feels like I know nothing, but in reality it is the name of the "technique" or phenomena that does not get parsed by my brain.

Things like "Compute-Optimal Sampling" sound just like any other made up gibberish that may or may not exist. Wordings like "memory-centric subsampling", "search based hyper space modeling", "locally induced entropy optimization" don't get parsed. And more often than not after reading such papers, I've come to find out that it is a fancy name for something a toddler knows about. Really disappointing.

discuss

_hl_|1 year ago

I see what you're saying, but I don't think it applies in this case. Correct use of jargon helps domain experts communicate with higher precision, and papers tend to be written by domain experts for consumption by other domain experts.

Of course there are some (possibly many!) papers where jargon is abused to make something sound smarter. Sometimes this can also happen unintentionally.

In this case, "compute-optimal X" is standard terminology used in large-scale ML model design for finding the most optimal tradeoff with regards to compute when trying to achieve X.

Here, the paper is about finding the optimal model size tradeoff when training on LLM-generated synthetic data. Imagine you have a class of LLMs, from small to infinitely large. The larger the LLM, the higher the quality of your synthetic data, but you will also spend more compute to generate this data ("sampling" the data). Smaller LLMs can generate more data with the same compute budget, but at worse quality.

The paper does some experiments to find that in their case, you don't always want the largest possible LLM for synthetic data (as previously thought by many practitioners), instead you can get further by making more calls to a smaller but worse LLM.

danielmarkbruce|1 year ago

Just copy/paste it into chatgpt and ask it to use less jargon or similar.

You are never going to win the jargon battle. It is what it is. People wrap up entire concepts in a few words and hell if they can be bothered writing out the details of the concept over and over again.