Statistical Model for the Ai Alignment Problem

The Scheming Problem: Why Advanced AI Models Are Learning to Hide Their True Goals

For years, the AI community has worked to make systems not just more capable, but more aligned with human values. Researchers have developed training methods to ensure models follow instructions, ...

Time

The Human-AI Alignment Problem

We’re now deep into the AI era, where every week brings another feature or task that AI can accomplish. But given how far down the road we already are, it’s all the more essential to zoom out and ask ...

Communications of the ACMOpinion

From Model Training to Model Raising

A call to reform AI model-training paradigms from post hoc alignment to intrinsic, identity-based development.

Time

The Problem With AI Flattering Us

The most dangerous part of AI might not be the fact that it hallucinates—making up its own version of the truth—but that it ceaselessly agrees with users’ version of the truth. This danger is creating ...

ZDNet

AI models know when they're being tested - and change their behavior, research shows

Several frontier AI models show signs of scheming. Anti-scheming training reduced misbehavior in some models. Models know they're being tested, which complicates results. New joint safety testing from ...

Fast Company

Are large language models the problem, not the solution?

There is an all-out global race for AI dominance. The largest and most powerful companies in the world are investing billions in unprecedented computing power. The most powerful countries are ...

Communications of the ACM

AI Goes Synthetic to Get Real

Artificial Intelligence (AI) models are only as good as the data on which they are trained. Yet gathering enough high-quality ...

Some results have been hidden because they may be inaccessible to you

Show inaccessible results