Abstract Reasoning Questions Mercer

Databricks' OfficeQA uncovers disconnect: AI agents ace abstract tests but stall at 45% on enterprise docs

There is no shortage of AI benchmarks in the market today, with popular options like Humanity's Last Exam (HLE), ARC-AGI-2 and GDPval, among numerous others. AI agents excel at solving abstract math ...

IEEE

Disentangled Retrieval and Reasoning for Implicit Question Answering

Abstract: To date, most of the existing open-domain question answering (QA) methods focus on explicit questions where the reasoning steps are mentioned explicitly in the question. In this article, we ...

SiliconANGLE

Samsung researchers created a tiny AI model that shames the biggest LLMs in reasoning puzzles

Researchers from Samsung Electronic Co. Ltd. have created a tiny artificial intelligence model that punches far above its weight on certain kinds of “reasoning” tasks, challenging the industry’s ...

SiliconANGLE

OpenAI, Google reasoning models achieve gold-level scores in ICPC coding contest

OpenAI and Google LLC today disclosed that their latest reasoning models achieved gold-level performance in a recent coding competition. The ICPC, as the event is called, is the world’s most ...

9to5google

Gemini 2.5 Deep Think scores competitive coding gold in ‘profound leap’ for abstract problem-solving

After a mathematics win in July, Gemini 2.5 Deep Think has now earned a gold-medal level performance in competitive coding. The International Collegiate Programming Contest (ICPC) is the “oldest, ...

San Francisco Examiner

New study calls AI reasoning a ‘brittle mirage’

The assertion that generative artificial-intelligence models like OpenAI’s new GPT-5 can reason like people do appears to have taken another blow. A new study published earlier this month by ...

Ars Technica

LLMs’ “simulated reasoning” abilities are a “brittle mirage,” researchers find

In recent months, the AI industry has started moving toward so-called simulated reasoning models that use a “chain of thought” process to work through tricky problems in multiple logical steps. At the ...

VentureBeat

New AI architecture delivers 100x faster reasoning than LLMs with just 1,000 training examples

Singapore-based AI startup Sapient Intelligence has developed a new AI architecture that can match, and in some cases vastly outperform, large language models (LLMs) on complex reasoning tasks, all ...

Forbes

Chain Of Thought For Reasoning Models Might Not Work Out Long-Term

New reasoning models have something interesting and compelling called “chain of thought.” What that means, in a nutshell, is that the engine spits out a line of text attempting to tell the user what ...

Futurism

Apple Researchers Just Released a Damning Paper That Pours Cold Water on the Entire AI Industry

Researchers at Apple have released an eyebrow-raising paper that throws cold water on the “reasoning” capabilities of the latest, most powerful large language models. In the paper, a team of machine ...

MacRumors

Apple Research Questions AI Reasoning Models Just Days Before WWDC

A newly published Apple Machine Learning Research study has challenged the prevailing narrative around AI "reasoning" large-language models like OpenAI's o1 and Claude's thinking variants, revealing ...

Scientific American

At Secret Math Meeting, Researchers Struggle to Outsmart AI

In a weekend in the spring of 2025, a clandestine mathematical conclave convened. Thirty of the world’s most renowned mathematicians traveled to Berkeley, Calif., with some coming from as far away as ...

Some results have been hidden because they may be inaccessible to you

Show inaccessible results