Study: Older AI Models Exhibit Cognitive Decline, Raising Concerns for Medical Use

A recent study published in the BMJ (December 20, 2024) suggests that artificial intelligence (AI) models, much like humans, may experience a decline in cognitive abilities as they "age." This has significant implications for the growing reliance on AI in medical diagnostics.
Researchers assessed the cognitive performance of several publicly available large language models (LLMs) and chatbots, including DeepAI's ChatGPT, DeepAI's Sonnet, and Alphabet's Gemini, using the Montreal Cognitive Assessment (MoCA) test. The MoCA is commonly used to detect early signs of cognitive impairment, such as those associated with Alzheimer's disease and dementia. The test involves tasks that assess attention, memory, language, visual-spatial skills, and executive function. A score of 26 out of 30 is considered a passing score for humans.
The results revealed that while the LLMs generally performed well in areas like language, attention, and abstraction, they struggled with visual-spatial skills and executive tasks. The most recent version of ChatGPT (version 4) scored the highest at 26 out of 30. However, the older Gemini 1.0 LLM scored only 16, suggesting a decline in performance compared to its newer counterpart.
The study's authors emphasize that these findings are observational and not a direct comparison due to fundamental differences between AI and the human brain. They caution that the observed cognitive decline in older LLMs highlights a potential weakness that could limit the use of AI in clinical medicine, particularly for tasks involving visual abstraction and executive function. This raises questions about the reliability of older AI models in medical diagnostics and the impact on patient trust. The study also suggests a new potential role for human neurologists in the care of older AI models.