Study reveals ChatGPT missteps in emergency care, overprescribing X-rays and antibiotics. Experts caution against relying on AI for complex medical decisions.
ChatGPT, despite its impressive medical knowledge, may inadvertently contribute to overprescribing x-rays and antibiotics in emergency settings. (1✔ ✔Trusted Source
Evaluating the use of large language models to provide clinical recommendations in the Emergency Department
Go to source) The study led by researchers from the University of California-San Francisco (UCSF) showed that ChatGPT even admitted people who didn’t require hospital treatment. In the paper published in the journal Nature Communications, the researchers said that, while the model could be prompted in ways that make its responses more accurate, it’s still no match for the clinical judgement of a human doctor.
‘#ChatGPT, the #AI that's changing everything, might be making some questionable medical decisions. Did you know it's been linked to overprescribing #Xrays and #antibiotics in emergency care?’
Don’t Blindly Trust ChatGPT in Emergency Care
“This is a valuable message to clinicians not to blindly trust these models,” said lead author postdoctoral scholar Chris Williams at UCSF. “ChatGPT can answer medical exam questions and help draft clinical notes, but it’s not currently designed for situations that call for multiple considerations, like the situations in an emergency department,” he added.A recent study by Williams showed that ChatGPT, a large language model (LLM) was slightly better than humans at determining which of two emergency patients was most acutely unwell -- a straightforward choice between patient A and patient B.
In the current study, he challenged the AI model to perform a more complex task: providing the recommendations a physician makes after initially examining a patient in the emergency -- whether to admit the patient, get x-rays or other scans, or prescribe antibiotics.
For each of the three decisions, the team compiled a set of 1,000 emergency visits to analyse from an archive of more than 251,000 visits. The sets had the same ratio of “yes” to “no” responses for decisions on admission, radiology, and antibiotics. The team entered doctors’ notes on each patient’s symptoms and examination findings into ChatGPT-3.5 and ChatGPT-4.
Then, the accuracy of each set was tested with increasingly detailed prompts. The results showed the AI models recommended services more often than was needed. While ChatGPT-4 was 8 percent less accurate than resident physicians, ChatGPT-3.5 was 24 percent less accurate. “AI’s tend to overprescribe because these models are trained on the internet. To date, legitimate medical advice-giving sites have not been designed, which can answer emergency medical questions."
Advertisement
- Evaluating the use of large language models to provide clinical recommendations in the Emergency Department - (https://www.nature.com/articles/s41467-024-52415-1)