Well, Chomsky already dismissed corpus based linguistics in the 90s and 2000s, because a corpus (large collection of text documents, e.g., newspaper, blog post, literature or everything mixed together) is never a good enough approximation of the true underlying distribution of all words/constructs in a language.
For example, a newspaper-based corpus might have frequent occurences of city names or names of politicians, whereas they might not occur that often in real everyday speech, because many people don't actually talk about those politicians all day long. Or, alternatively, names of small cities might have a frequency of 0.Naturally, he will, and does, also dismiss anything that occured in the ML field in the past decade.
But I agree with the article. Dealing with language only in a theoretical/mathematical way, not even trying to evaluate your theories with real data, is just not very efficient and ignores that language models do seem to work to some degree.
blululu|3 years ago
masswerk|3 years ago
[0] Minsky, Marvin, “Steps Toward Artificial Intelligence”, Proceedings of the IRE, Volume: 49, Issue: 1, Jan. 1961: https://courses.csail.mit.edu/6.803/pdf/steps.pdf
cma|3 years ago
With current models if you increased parameters but gave it a similar amount of data it would overfit.
aaroninsf|3 years ago
We (all of us) are very bad at non-linear reasoning, reasoning with orders of magnitude, and (by extension) have no valid intuition about emergent behaviors/properties in complex systems.
In the case of scaled ML this is quite obvious in hindsight. There are many now-classic anecdotes about even those devising contemporary scale LLM being surprised and unsettled by what even their first versions were capable of.
As we work away at optimizations and architectural features and expediencies which render certain classes of complex problem solving tractable by our ML,
we would do well to intentionally filter for further emergent behavior.
Whatever specific claims or notions any member has that may be right or wrong, the LessWrong folks are at least taking this seriously...
aaroninsf|3 years ago
My own hobby horse of late is that independent of its tethering to information about reality available through sensorium and testing, LLM are already doing more than building models of language qua language. Write up someone pointed me at: https://thegradient.pub/othello/
bmitc|3 years ago
masswerk|3 years ago