top | item 34902791

(no title)

matmatmatmat | 3 years ago

Question on the ML side of this post: How are these "parameterizations" used? Is this really just feature engineering with a new name? Are they including this information when training the model?

In the article, they mention using the new labels to build a "more balanced" dataset -- is this a realistic possibility in practice when most teams still have a dearth of data?

discuss

order

elandau25|3 years ago

Hello! I wrote the article so happy to answer this. It is partially feature engineering but partially not. It’s essentially using feature engineering to curate/correct a dataset, but a neural network as the actual end model without explicit input of these features(we call them quality metrics). I abbreviated a good amount of the process in the article so that it wouldn’t run forever, but essentially we allowed ChatGPT to select and write its own features and then used the strategies it came up with to apply these features to improve the dataset.

In terms of if it’s realistic in practice, the answer is yes. Some teams have a dearth of data, but many AI companies we work with have more data than they can use, and it’s more a question of how to sample, curate, and correct the data and labels they have to improve their models rather than collect new data. Great questions!