top | item 40074834

(no title)

“What this work shows is that the knowledge and reasoning ability of these large language models in an eye health context is now almost indistinguishable from experts,” said Arun Thirunavukarasu, the lead author of a paper on the findings published in PLOS Digital Health journal.”

FTFA.

discuss

nicklecompte|1 year ago

What this work actually shows is that a bunch of scientists are ignorant about how LLMs work, but are rushing to publish papers about them anyway. It is ridiculous for Thirunavukarasu to draw this conclusion from GPT-4's performance on a written ophthalmology exam.

From the good folks at AI Snake Oil[1]

> Memorization is a spectrum. Even if a language model hasn’t seen an exact problem on a training set, it has inevitably seen examples that are pretty close, simply because of the size of the training corpus. That means it can get away with a much shallower level of reasoning....In some real-world tasks, shallow reasoning may be sufficient, but not always. The world is constantly changing, so if a bot is asked to analyze the legal consequences of a new technology or a new judicial decision, it doesn’t have much to draw upon. In short, as Emily Bender points out, tests designed for humans lack construct validity when applied to bots.

> On top of this, professional exams, especially the bar exam, notoriously overemphasize subject-matter knowledge and underemphasize real-world skills, which are far harder to measure in a standardized, computer-administered way. In other words, not only do these exams emphasize the wrong thing, they overemphasize precisely the thing that language models are good at.

Also[2]:

> Undoubtedly, AI and LLMs will transform every facet of what we do, from research and writing to graphic design and medical diagnosis. However, its current success in passing standardized test after standardized test is an indictment of what and how we train our doctors, our lawyers, and our students in general. ChatGPT passed an examination that rewards memorizing the components of a system rather than analyzing how it works, how it fails, how it was created, how it is maintained. Its success demonstrates some of the shortcomings in how we train and evaluate medical students. Critical thinking requires appreciation that ground truths in medicine continually shift, and more importantly, an understanding how and why they shift. Perhaps the most important lesson from the success of LLMs in passing examinations such as the USMLE is that now is the time to rethink how we train and evaluate our students.

[1] https://www.aisnakeoil.com/p/gpt-4-and-professional-benchmar...

[2] https://journals.plos.org/digitalhealth/article?id=10.1371/j...

Filligree|1 year ago

That might be the thing that makes me most optimistic about AI.

Not because they’re super useful. They are, if and only if you use them right, which is a skill few people seem to have.

But because they’re illuminating flaws in how we’re train our students, and act as a forcing function to _make_ the universities and schools fix that. There’s no longer any choice!

boyka|1 year ago

FTFA?

alex_suzuki|1 year ago

From The F**ing Article, presumably.