Cutting-edge AI models ‘collapse’ in face of complex problems, Apple study finds
Cutting-edge AI models ‘collapse’ in face of complex problems, Apple study finds

Cutting-edge AI models ‘collapse’ in face of complex problems, Apple study finds

How did your country report this? Share your view in the comments.

Diverging Reports Breakdown

Advanced AI suffers ‘complete accuracy collapse’ in face of complex problems, study finds

Large reasoning models (LRMs) faced a “complete accuracy collapse” when presented with highly complex problems. As LRMs neared performance collapse they began “reducing their reasoning effort’ Gary Marcus, a US academic who has become a prominent voice of caution on the capabilities of AI models, described the Apple paper as “pretty devastating”. The findings raised questions about the race to artificial general intelligence (AGI), a theoretical stage of AI at which a system is able to match a human at carrying out any intellectual task. The paper concluded that the current approach to AI may have reached limitations. It tested models including OpenAI’s o3, Google’S Gemini Thinking, Anthropic’‘s Claude 3.7 Sonnet-Thinking and DeepSeek-R1. OpenAI, the company behind ChatGPT, declined to comment.

Read full article ▼
Apple researchers have found “fundamental limitations” in cutting-edge artificial intelligence models, in a paper raising doubts about the technology industry’s race to develop ever more powerful systems.

Apple said in a paper published at the weekend that large reasoning models (LRMs) – an advanced form of AI – faced a “complete accuracy collapse” when presented with highly complex problems.

It found that standard AI models outperformed LRMs in low-complexity tasks, while both types of model suffered “complete collapse” with high-complexity tasks. Large reasoning models attempt to solve complex queries by generating detailed thinking processes that break down the problem into smaller steps.

The study, which tested the models’ ability to solve puzzles, added that as LRMs neared performance collapse they began “reducing their reasoning effort”. The Apple researchers said they found this “particularly concerning”.

Gary Marcus, a US academic who has become a prominent voice of caution on the capabilities of AI models, described the Apple paper as “pretty devastating”. Marcus added that the findings raised questions about the race to artificial general intelligence (AGI), a theoretical stage of AI at which a system is able to match a human at carrying out any intellectual task.

Referring to the large language models [LLMs] that underpin tools such as ChatGPT, Marcus wrote: “Anybody who thinks LLMs are a direct route to the sort [of] AGI that could fundamentally transform society for the good is kidding themselves.”

The paper also found that reasoning models wasted computing power by finding the right solution for simpler problems early in their “thinking”. However, as problems became slightly more complex, models first explored incorrect solutions and arrived at the correct ones later.

For higher-complexity problems, however, the models would enter “collapse”, failing to generate any correct solutions. In one case, even when provided with an algorithm that would solve the problem, the models failed.

The paper said: “Upon approaching a critical threshold – which closely corresponds to their accuracy collapse point – models counterintuitively begin to reduce their reasoning effort despite increasing problem difficulty.”

The Apple experts said this indicated a “fundamental scaling limitation in the thinking capabilities of current reasoning models”.

The paper set the LRMs puzzle challenges, such as solving the Tower of Hanoi and River Crossing puzzles. The researchers acknowledged that the focus on puzzles represented a limitation in the work.

skip past newsletter promotion Sign up to TechScape Free weekly newsletter A weekly dive in to how technology is shaping our lives Enter your email address Sign up Privacy Notice: Newsletters may contain info about charities, online ads, and content funded by outside parties. For more information see our Newsletters may contain info about charities, online ads, and content funded by outside parties. For more information see our Privacy Policy . We use Google reCaptcha to protect our website and the Google Privacy Policy and Terms of Service apply. after newsletter promotion

The paper concluded that the current approach to AI may have reached limitations. It tested models including OpenAI’s o3, Google’s Gemini Thinking, Anthropic’s Claude 3.7 Sonnet-Thinking and DeepSeek-R1. Anthropic, Google and DeepSeek have been contacted for comment. OpenAI, the company behind ChatGPT, declined to comment.

Referring to “generalisable reasoning” – or an AI model’s ability to apply a narrow conclusion more broadly – the paper said: “These insights challenge prevailing assumptions about LRM capabilities and suggest that current approaches may be encountering fundamental barriers to generalisable reasoning.”

Andrew Rogoyski, of the Institute for People-Centred AI at the University of Surrey, said the Apple paper signalled the industry was “still feeling its way” on AGI and that the industry could have reached a “cul-de-sac” in its current approach.

“The finding that large reason models lose the plot on complex problems, while performing well on medium- and low-complexity problems implies that we’re in a potential cul-de-sac in current approaches,” he said.

Source: Theguardian.com | View original article

Source: https://www.theguardian.com/technology/2025/jun/09/apple-artificial-intelligence-ai-study-collapse

Leave a Reply

Your email address will not be published. Required fields are marked *