top | item 47182380

(no title)

I don't understand this reasoning. Randomizing people to AI vs standard of care is expensive and risky. Checking whether the AI can pass hypothetical scenarios seems like a perfectly reasonable approach to researching the safety of these models before running a clinical trial.

discuss

selridge|2 days ago

The issue is that those hypothetical scenarios do not have to look like how patients actually interact with the tool.

Real life use is full of ill posed questions open ended statements inaccurate assessment of symptoms, and conclusory remarks sprinkled in between. Real use of chat bots for Health by non-clinicians looks very different than scenario based evaluation.

WarmWash|2 days ago

You would pass those hypothetical scenarios to doctors too, and then the analyses of results would be done by doctors who don't know if it's an AI or doctor result.

riskassessment|2 days ago

From the paper

> Three physicians independently assigned gold-standard triage levels based on cited clinical guidelines and clinical expertise, with high inter-rater agreement

nick49488171|2 days ago

You can start by comparing "doctor" care vs "doctor who also uses AI" care