ChatGPT-3 and ChatGPT-4, OpenAI’s language processing models, failed the American College of Gastroenterology’s 2021 and 2022 self-assessment tests, according to a study published earlier this week in The American Journal of Gastroenterology.
ChatGPT is a large language model that generates human-like text in response to user questions or statements.
Researchers from the Feinstein Institutes for Medical Research asked both versions of ChatGPT to answer test questions to gauge its capabilities and accuracy.
Each test consists of 300 multiple-choice questions. The researchers copied and pasted every multiple-choice question and answer, excluding those requiring images, into the AI-powered platform.
ChatGPT-3 and ChatGPT-4 answered 455 questions, with ChatGPT-3 answering 296 of 455 questions correctly and ChatGPT-4 answering 284 correctly.
To pass the test, individuals must score 70% or higher. ChatGPT-3 scored 65.1% and ChatGPT-4 scored 62.4%.
The self-assessment test is used to determine how an individual would score on the American Board of Internal Medicine Gastroenterology exam.
“Recently, ChatGPT and the use of AI have received a lot of attention in various industries. When it comes to medical education, there is a lack of research around this potentially revolutionary tool,” said Dr. Arvind Trindade, associate professor. at the Feinstein Institutes Institute of Health System Sciences and senior author of the paper, said in a statement. “Based on our research, ChatGPT should not be used for medical education in gastroenterology at this time and still has some way to go before it is implemented in healthcare.”
WHY IS IT IMPORTANT
Study researchers noted that ChatGPT’s failing rating could be due to a lack of access to paid medical journals or outdated information in its system, and more research is needed before it can be determined. use reliably.
Always a study published in PLOS Digital Health in February revealed that researchers tested ChatGPT’s performance on the United States medical licensing exam, which consists of three exams. The AI tool passed or approached the threshold for all three reviews and showed a high level of insight in its explanations.
ChatGPT also provided “largely appropriate” answers to questions about cardiovascular disease prevention, according to a research letter published in JAMA.
The researchers collected 25 questions on basic concepts of heart disease prevention, including advice on risk factors, test results and medication information, and posed the questions to the AI chatbot. Clinicians rated responses as appropriate, inappropriate or unreliable, and found that 21 of the 25 questions were considered appropriate, four were deemed inappropriate.