top | item 43467170

(no title)

szvsw | 11 months ago

I mostly agree with what your are saying but…

> passing them means nothing about the ability of neural net-based systems to understand language, regardless of how much their authors designed them to test language understanding.

Does this implicitly suggest that it is impossible to quantitatively assess a system’s ability to understand language? (Using the term “system” in the broadest possible sense)

Not agreeing or disagreeing or asking with skepticism. Genuinely asking what your position is here, since it seems like your comment eventually leads to the conclusion that it is unknowable whether a system external to yourself understands language, or, if it is possible, then only in a purely qualitative way, or perhaps purely in a Stewart-style-pornographic-threshold-test - you’ll know it when you see it.

I don’t have any problem if that’s your position- it might even be mine! I’m more or less of the mindset that debating whether artificial systems can have certain labels attached to them revolving around words like “understanding,” “cognition,” “sentience” etc is generally unhelpful, and it’s much more interesting to just talk about what the actual practical capabilities and functionalities of such systems are on the one hand in a very concrete, observable, hopefully quantitative sense, and how it feels to interact with them in a purely qualitative sense on the other hand. Benchmarks can be useful in the former but not the latter.

Just curious where you fall. How would you recommend we approach the desire to understand whether such systems can “understand language” or “solve problems” etc etc… or are these questions useless in your view? Or only useful in as much as they (the benchmarks/tests etc) drive the development of new methodologies/innovations/measurable capabilities, but not in assigning qualitative properties to said systems?

discuss

order

YeGoblynQueenne|11 months ago

>> Does this implicitly suggest that it is impossible to quantitatively assess a system’s ability to understand language? (Using the term “system” in the broadest possible sense)

I don't know and I don't have an opinion. I know that tests that claimed to measure language understanding, historically, haven't. There's some literature on the subject if you're curious (sounds like you are). I'd start here:

Climbing towards NLU: On Meaning, Form, and Understanding in the Age of Data

Emily M. Bender, Alexander Koller

https://aclanthology.org/2020.acl-main.463/

Quoting the passage that I tend to remember:

>> While large neural LMs may well end up being important components of an eventual full-scale solution to human-analogous NLU, they are not nearly-there solutions to this grand challenge. We argue in this paper that genuine progress in our field — climbing the right hill, not just the hill on whose slope we currently sit —depends on maintaining clarity around big picture notions such as meaning and understanding in task design and reporting of experimental results.