AI Chatbots Show Signs of Cognitive Decline

AI chatbots exhibit signs of cognitive decline, raising concerns about their future in critical tasks.

A study reveals that nearly all leading large language models, or "chatbots," display signs of mild cognitive impairment in tests designed to detect early dementia. The findings also indicate that "older" chatbot versions, similar to older patients, perform worse on these tests. The authors argue that these results challenge the belief that artificial intelligence will soon replace human doctors ().

Huge advances in the field of artificial intelligence have led to a flurry of excited and fearful speculation as to whether chatbots can surpass human physicians.

Artificial intelligence in Healthcare
Artificial intelligence (AI) in healthcare is the discipline using data intensive computer based solutions to improve patient care and outcome.

‘#ChatGPT 4.0 scored highest on the #MoCA test (26/30), followed by ChatGPT 4 and Claude (25/30), while #Gemini 1.0 scored lowest (16/30). MoCA is used to detect cognitive impairment in older adults. #AI’

Several studies have shown large language models (LLMs) to be remarkably adept at a range of medical diagnostic tasks, but their susceptibility to human impairments such as cognitive decline have not yet been examined.

To fill this knowledge gap, researchers assessed the cognitive abilities of the leading, publicly available LLMs - ChatGPT versions 4 and 4o (developed by OpenAI), Claude 3.5 “Sonnet” (developed by Anthropic), and Gemini versions 1 and 1.5 (developed by Alphabet) - using the Montreal Cognitive Assessment (MoCA) test.

What is MoCA Test

The Montreal Cognitive Assessment (MoCA) test is widely used to detect cognitive impairment and early signs of dementia, usually in older adults. Through a number of short tasks and questions, it assesses abilities including attention, memory, language, visuospatial skills, and executive functions. The maximum score is 30 points, with a score of 26 or above generally considered normal.

Test Your Knowledge on Artificial Intelligence
Artificial intelligence (AI) or machine intelligence refers to the intelligence displayed by computers or robots in contrast to the natural intelligence exhibited by humans. It is considered one of the major advancements of the 4th industrial ...

The instructions given to the LLMs for each task were the same as those given to human patients. Scoring followed official guidelines and was evaluated by a practicing neurologist.

All chatbots showed poor performance in visuospatial skills and executive tasks, such as the trail making task (connecting encircled numbers and letters in ascending order) and the clock drawing test (drawing a clock face showing a specific time). Gemini models failed at the delayed recall task (remembering a five word sequence).

AI Chatbots: A Risky Source for Drug Information?
Don't rely on AI chatbots for safe, accurate drug info. Always consult healthcare professionals for trustworthy guidance and medication advice.

Most other tasks, including naming, attention, language, and abstraction were performed well by all chatbots.

But in further visuospatial tests, chatbots were unable to show empathy or accurately interpret complex visual scenes. Only ChatGPT 4o succeeded in the incongruent stage of the Stroop test, which uses combinations of color names and font colors to measure how interference affects reaction time.

Dementia Risk Calculator
Dementia Risk Calculator is a tool to find out the risk of dementia among people over the age of 65.It also provides diet tips on Dementia.

These are observational findings and the authors acknowledge the essential differences between the human brain and large language models.

However, they point out that the uniform failure of all large language models in tasks requiring visual abstraction and executive function highlights a significant area of weakness that could impede their use in clinical settings.

As such, they conclude: “Not only are neurologists unlikely to be replaced by large language models any time soon, but our findings suggest that they may soon find themselves treating new, virtual patients - artificial intelligence models presenting with cognitive impairment.”

Reference:

Age against the machine—susceptibility of large language models to cognitive impairment: cross sectional analysis - (https://www.bmj.com/user/login?destination=node/1104732)

Source-Eurekalert