AI Tool ChatGPT's Performance Inaccuracy Revealed

Creator: Karishma Abhishek
Keywords: ChatGPT, Artificial Intelligence Tool, AI, AI Test, Ophthalmologists, Accuracy Rate, Board Certification, Test

by Karishma Abhishek on April 30, 2023 at 10:59 PM

Listen to this News

ChatGPT JAMA Ophthalmology .

The study that was led by St. Michael's Hospital, a site of Unity Health Toronto, found ChatGPT correctly answered 46 percent of questions when initially conducted in Jan. 2023.

‘Research indicates that ChatGPT, an AI tool, struggled to correctly answer a significant portion of test questions in a commonly used ophthalmology study resource for board certification.’

Struggles of ChatGPT

When researchers conducted the same test one month later, ChatGPT scored more than 10 percent higher.

The potential of AI in medicine and exam preparation has garnered excitement since ChatGPT became publicly available in Nov. 2022.

It's also raising concern for the potential of incorrect information and cheating in academia. ChatGPT is free, available to anyone with an internet connection, and works in a conversational manner.

"ChatGPT may have an increasing role in medical education and clinical practice over time, however, it is important to stress the responsible use of such AI systems," said Dr. Rajeev H. Muni, principal investigator of the study and a researcher at the Li Ka Shing Knowledge Institute at St. Michael's.

"ChatGPT, as used in this investigation, did not answer sufficient multiple choice questions correctly for it to provide substantial assistance in preparing for board certification at this time."

Assessing ChatGPT's Accuracy on a Widely Used Ophthalmology Study Resource

Researchers used a dataset of practice multiple choice questions from the free trial of OphthoQuestions, a common resource for board certification exam preparation.

To ensure ChatGPT's responses were not influenced by concurrent conversations, entries or conversations with ChatGPT were cleared before inputting each question and a new ChatGPT account was used.

Questions that used images and videos were not included because ChatGPT only accepts text input.

Of 125 text-based multiple-choice questions, ChatGPT answered 58 (46 percent) questions correctly when the study was first conducted in Jan. 2023. Researchers repeated the analysis on ChatGPT in Feb. 2023, and the performance improved to 58 percent.

"ChatGPT is an artificial intelligence system that has tremendous promise in medical education. Though it provided incorrect answers to board certification questions in ophthalmology about half the time, we anticipate that ChatGPT's body of knowledge will rapidly evolve," said Dr. Marko Popovic, a co-author of the study and a resident physician in the Department of Ophthalmology and Vision Sciences at the University of Toronto.

Uncovering the Limitations of ChatGPT

ChatGPT closely matched how trainees answer questions and selected the same multiple-choice response as the most common answer provided by ophthalmology trainees 44 percent of the time.

ChatGPT selected the multiple-choice response that was least popular among ophthalmology trainees 11 percent of the time, second least popular 18 percent of the time, and second most popular 22 percent of the time.

"ChatGPT performed most accurately on general medicine questions, answering 79 percent of them correctly. On the other hand, its accuracy was considerably lower on questions for ophthalmology subspecialties. For instance, the chatbot answered 20 percent of questions correctly on oculoplastics and zero percent correctly from the subspecialty of the retina. The accuracy of ChatGPT will likely improve most in niche subspecialties in the future," said Andrew Mihalache, lead author of the study and undergraduate student at Western University.

Source: Eurekalert

AI Tool ChatGPT's Performance Inaccuracy Revealed

Struggles of ChatGPT

Assessing ChatGPT's Accuracy on a Widely Used Ophthalmology Study Resource

Uncovering the Limitations of ChatGPT

Most Popular On Medindia